Agile Developer for Agile Actors Hellas

Posted on Jun 15 • Edited on Jun 25

Lifting the magic in dashboarding a table.

#python #analytics #jupyter

Prologue

A common task in data analysis, after running the various preprocessing and data testing tasks, is presentation. We will be more specific in this article. We focus on presenting a data frame as a table dashboard in a notebook. We have worked hard in cleaning up and testing a collection of data frames from various data sources. At the very end we join and project our collection as a "gold" data frame that we can use for visualizations. A typical first step is to present our data frame as a table visualization for further inspection. This has been done from one of our customers with the help of a Python package, called Panel and an extension specific for table dashboarding called Tabulator. Part of the Spike we had run was the question of whether we could reproduce the behavior of the said combo through IPywidgets. This is very typical when we work in a sandboxed environment. The Spike turned out to be very interesting and I went on to do some further investigation on my own.
We will analyze this question for an existing example and we will hunt down the various intricacies involved. The conclusions are similar to what I have found. Gear up!

DISCLAIMER: This is reimplementation of the interactivity features of:

https://kdheepak.com/blog/building-dashboards-using-param-and-panel-in-python/

Necessary background on reactivity

Before touching any Jupyter notebook let us first give an introduction to a reactive library that is used in Panel and I will also use for the IPywidgets experiments. It is a very interesting one because it can help a lot with reactive calculations. It is higher level than Rx, still, it covers enough common use cases. Here is the conceptual diagram of the reactive behavior.

Here we have 5 parameters. Parameters 1, 2 and 3 are free. But parameter 4 depends on 1 and 2, while 5 depends on 4 and 3. This dependency graph (DAG as data engineers call it) is of paramount importance for performance reasons. When a free parameter changes, not all dependent parameters need to be recomputed. We only recompute what needs to be recomputed. So If I change parameter 3 only 5 needs to be recomputed. This is exactly the functionality of Param library, and now we are ready to give the code example. It is here.

class MoviesPanel(param.Parameterized):
    start_year = param.Integer()
    end_year = param.Integer()
    filtered_df = param.DataFrame()

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        basics = pd.read_csv("./data/title.basics.tsv.gz", sep="\t", nrows=500)
        ratings = pd.read_csv("./data/title.ratings.tsv.gz", sep="\t", nrows=500)
        self.df = basics.merge(ratings, on="tconst").dropna()
        self.min_year = int(self.df["startYear"].min())
        self.max_year = int(self.df["startYear"].max())
        self.start_year = self.min_year
        self.end_year = self.max_year

    @param.depends("start_year", "end_year", watch=True)
    def tabulating(self):
        self.filtered_df = (
            self.df.query(f"startYear >= {self.start_year}")
            .query(f"startYear <= {self.end_year}")
            .sort_values(by=["startYear"])
        )

We have 3 params start_year, end_year and filtered_df. The filtered_df is the dependent param that needs to be re-computed. This happens with a function that should update only filtered_df as a side effect and is annotated with

 @param.depends("start_year", "end_year", watch=True)

There is a dataset (I have some articles on this) encoded as described before with the "gold" data frame self.df and we reactively subset it, based on the startYear attribute, with the reactive window specified by the two free params. Here is a sample execution

if __name__ == "__main__":
    m = MoviesPanel()
    print(f"Year bounds are {(m.min_year, m.max_year)}") #Year bounds are (1892, 1912)
    print(f"Rows are {len(m.filtered_df)}") #Rows are 486
    m.start_year = 1895
    print(f"New rows are {len(m.filtered_df)}") #New rows are 475

Putting reactivity to work

Now, having presented the basics, it is time for the leap of faith. Panel is built around Param. When Panel gets a parametrized class with parametrized methods like above, it knows how to observe the params of the reactive functions for changes even if we shut down the watcher (and avoid double computations). Let's decide upon our presentation logic. What we need to do is to display the data frame and observe it for changes when the start or end year change. Normally a display of the filtered data frame would not be enough, because presenting 10 or 20 lines do not capture the whole data frame. Presenting the whole data frame is not an option either, especially if it is large. This is exactly the problem Panel with the Tabulator combo aims to solve.
Let us first present the end result. A picture is a thousand words after all.

Here the Tabulator presents the data frame in a paged way and also reacts to changes of Start/End Year. This has been run on OpenColab. But you can also work locally with Jupyter Notebook/ Jupyter Lab or VSCodium. It is pretty easy to setup and we will need it later. For Windows 11 we first need to enable long file support like this. Then, we set up your virtual environment and the corresponding kernel (the assumption is that Jupyter Notebook is already installed with widgetsnbextension). My installation works on Python 3.14.6.

cd panel-and-ipywidgets-experiments
uv venv --python 3.14
.venv\Scripts\activate
python -m ensurepip
python -m pip install -U -r requirements.txt
python -m ipykernel install --user --name panels314

Let's revisit the code. (whole code is here

import pandas as pd
import panel as pn
import param

pn.extension("tabulator", "ipywidgets")

class MoviesPanel(param.Parameterized):
    start_year = param.Integer()
    end_year = param.Integer()

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        basics = pd.read_csv("./data/title.basics.tsv.gz", sep="\t", nrows=500)
        ratings = pd.read_csv("./data/title.ratings.tsv.gz", sep="\t", nrows=500)
        self.df = basics.merge(ratings, on="tconst").dropna()
        self.start_year = int(self.df["startYear"].min())
        self.end_year = int(self.df["startYear"].max())

    @param.depends("start_year", "end_year", watch=False)
    def tabulating(self):
        print("Tabulating")
        filtered_df = (
            self.df.query(f"startYear >= {self.start_year}")
            .query(f"startYear <= {self.end_year}")
            .sort_values(by=["startYear"])
        )
        return pn.widgets.Tabulator(
            filtered_df,
            pagination="remote",
            page_size=5,
            disabled=True,
        )


m = MoviesPanel()
pn.Column(
    pn.Row(
        m.param.start_year,
        m.param.end_year,
    ),
    m.tabulating,
)

While the last lines are pretty typical widget setup code, the unusual part is the tabulating function. Previously we created a side effect. Now we return a result. It is exactly what we said above about Panel and Param being friends. Upon changes, Panel detects that the function (playing also the role of a widget) is called. The call is intercepted, the result is presented. It can be a value, or in this case a widget. There is a lot of "voodoo" done here. We will see later how we can achieve similar results without any magic wand!!! The Tabulator widget takes a data frame and some presentation hints and that's all. See more here.

BAD NEWS: This will not work in Databricks Free or not because Tabulator uses Javascript and Databricks is sandboxed. It also does not work on Nteract. Jupyter Lab/Notebook did not work for me.

VSCodium is fine if you follow the above instructions and use the virtual environment.

But panel has an ace up on its sleeve. If you encounter notebook problems, you can serve it and even watch for changes while you develop.

panel serve panel-example.ipynb

Lifting the magic with IPywidgets

As mentioned previously, part of the Spike was to port a code base to IPywidgets from Panel. Let's try this in a non-magical way (old school callbacks). We keep the class as before, a simple reactive class. No acrobatics or widgets. We need first two sliders. One for start year and one for the end year. Let's see the start year (whole code is here).

# Setup the slider
start_year_slider = widgets.IntSlider(
    value=m.min_year,
    min=m.min_year,
    max=m.max_year,
    step=1,
    description="Start Year:",
    disabled=False,
    continuous_update=False,
    orientation="horizontal",
    readout=True,
    readout_format="d",
)

# Create the callback
def interact_start_year(change):
    m.start_year = change.new
    redraw_df()


# Connect the callback with the slider
start_year_slider.observe(interact_start_year, names="value")

min and max year are static. Nothing reactive. The real work is done in the callback. There we update the parametrized class (so reactivity kicks in) and then we redraw the data frame, ourselves.

m = MoviesPanel()

output = widgets.Output()

def redraw_df():

    with output:
        display(m.filtered_df, clear=True)

We need a place-holder widget, called Output in IPywidgets terminology, for the IPython output to take place (with the display(...) function).

Then we assemble everything as before

redraw_df()
widgets.VBox([widgets.HBox([start_year_slider, end_year_slider]), output])

As you can see in the screenshot we partially achieved our goal because the output is not a table dashboard, even though the sliders are there and work fine. We can improve upon this, while staying in IPywidget territory, see Appendix.

GOOD NEWS: This code works every where. Even on Databricks

WARNING: Using Jupyter and VSCodium together may mixup kernel communication and result in render artefacts.

IPywidgets get along with Tabulator

So, if we can almost replicate the behavior with IPywidgets, can we replicate the appearance of it too? Panel uses the Tabulator which is a Javascript library. The documentation lists 4 ingredients for a successful visualization:

The Javascript library
The corresponding stylesheets (we will use the default ones)
A with an identifier as a placeholder for the chart.
The script that calls the library with our data.

Unfortunately the approach does not work in OpenColab because it is a sandboxed environment. We cannot load arbitrary web assets and we do not have a formal package that includes them. For this reason we aim for VScodium and Jupyter Notebook or Jpupyter Lab. There are two approaches.

Create an HTML file that will be rendered as an IFrame in the Output
Directly render our script after having set up the in Output, because it accepts HTML fragments.

The first option will not work at all for VSCodium because of it is affected by https://github.com/microsoft/vscode/issues/154722, even though the generated HTML file (testme.html) works fine when opened with the Integrated Web Browser. The second option did not work in our tests for similar reasons.

Our only hope now is Jupyter Notebook/Lab. Let's see the code for the first case. We do not cover the second case. We focus on the redraw function

def redraw_df():
     data_records = m.filtered_df.to_json(orient="records")
    column_spec = json.dumps(
        [
            {"title": some_column, "field": some_column, "align": "center"}
            for some_column in list(m.filtered_df.columns)
        ]
    )

    env = Environment(loader=FileSystemLoader("templates"))
    template = env.get_template("templated.html")
    some_html = template.render(data_records=data_records, column_spec=column_spec)

    with open("testme.html", "w") as f:
        f.write(some_html)
    print("render")
    with output:
        display(IFrame(src="./testme.html", width=1500, height=250), clear=True)

We use Jinja2 so as to template our HTML and not pollute our code.

For scrollability, like in the original we use this layout trick

widgets.VBox([widgets.HBox([start_year_slider, end_year_slider]), output], layout = widgets.Layout(overflow='auto'))

We fill the various parameters as described in Tabulator Doc, save the rendered HTML, and then we load it to an IFrame. Everytime we change the sliders, the IFrame is reloaded. Here is the output. Very close to what we aimed at.

BAD NEWS: This will work only in Jupyter Notebook or Lab

Epilogue

As you can see from the above discussion the solutions are sensitive to the platform you are running them. Databricks is heavily sandboxed, while Opencolab has relaxed rules. Fortunately, the "naive" approach works always. This is not the case for the other two. Also another observation is that you should normally work with VSCodium or Jupyter in separate folders. In my case the Jupyter checkpoints seem to interfere. The problems seen here took a lot to resolve and I strongly advise you to not accept something without testing. As always the code is available for you to download, execute and report any problems. My expectation is that now, you know better where you stand in this minefield.

Appendix

We can tabulate our data while staying in IPywidgets territory. This takes advantage of Param and the dynamicity of the widget set. Here is a possible implementation with a corresponding snapshot.

The idea behind this approach is that the user selects the page size (rows of data frame to be displayed) and navigates across pages. We use the Param library again to provide a paged data frame.

INITIAL_PAGE_SIZE = 6
PAGE_SIZES = [6, 12, 18]

class PagedMoviesPanel(param.Parameterized):
    page_index = param.Integer(1)
    page_size = param.Integer()
    num_pages = param.Integer()
    filtered_df = param.DataFrame()

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        basics = pd.read_csv("./data/title.basics.tsv.gz", sep="\t", nrows=500)
        ratings = pd.read_csv("./data/title.ratings.tsv.gz", sep="\t", nrows=500)
        self.df = (
            basics.merge(ratings, on="tconst")
            .dropna()
            .sort_values(by=["startYear"])
            .reset_index(drop=True)
        )
        self.page_size = INITIAL_PAGE_SIZE

    @param.depends("page_index", "page_size", watch=True)
    def tabulating(self):
        pos = (self.page_index - 1) * self.page_size
        self.filtered_df = self.df.iloc[pos : pos + self.page_size]

    @param.depends("page_size", watch=True)
    def paging(self):
        self.num_pages = (len(self.df) + self.page_size - 1) // self.page_size

So much about the reactivity. The even more interesting thing comes later. From the callback, and taking the order of operations into consideration, one can use the dynamicity of IPywidgets

def interact_page_size(change):
    new_page_size = int(change.new)
    page_slider.value = 1
    m.page_index = 1
    m.page_size = new_page_size
    page_slider.max = m.num_pages
    redraw_df()

First we reset the page slider and the page index, then we set the new page size on the parametrized class and finally we get back the new max of the page slider. See the code how this idea is implemented. You can also select columns to be displayed, through a multi-select.

GOOD NEWS: This code works every where. Even on Databricks