Using Solara for Data Science ============================= Solara offers a few specific tutorials, including one catered towards data science. This page offers a walk-through of Solara’s example found `here `__. | Use the steps outlined in the previous introduction `section <../../source/user/solara-intro.rst>`__ to: | 1. Launch a VS Code instance on Notebooks Hub | 2. Create a new Jupyter notebook (``.ipynb``) or Python script (``.py``) file to create your app | 2. Install Solara into the desired Python environment Install Dependencies -------------------- In this data science-specific tutorial, two additional packages will need to be installed: ``plotly`` for visualization and ``pandas`` for data analysis. - | **Built-In Environments** | These two packages have been pre-installed into the built-in **python-data-science** environments (e.g., ``python-data-science-0.1.8``). Simply ensure the appropriate kernel has been selected inside VS Code (see `here `__). - | **Custom Environments** | If using custom conda environments created using the instructions found `here <../../source/user/environments.md>`__, install ``plotly`` and ``pandas`` using the terminal. Packages installed into your custom environments do not need to be reinstalled with each new session. Installation of these packages can be confirmed by navigating to the terminal, activating the appropriate conda environment using ``conda activate ``, and running ``pip show plotly pandas``. .. code:: sos conda activate python-data-science-0.1.8 .. code:: sos pip show plotly pandas .. figure:: ../../img/vscode/vscode-terminal-pdspp.gif :alt: A gif of the terminal output when using pip show A gif of the terminal output when using pip show If using Jupyter notebook, a code cell can also be used to confirm packages are already installed using ``%pip show plotly pandas``. .. code:: sos %pip show plotly pandas If installation is necessary, use ``pip install`` to install both packages. This can be done inside the terminal using ``pip install plotly pandas`` or inside a Jupyter notebook cell using ``%pip install plotly pandas``. When using Jupyter, ensure the kernel is restarted after installation. .. code:: sos pip install plotly pandas # use this line inside the terminal %pip install plotly pandas # use this inside a Jupyter notebook cell Load Dataset ------------ This example utilizes the canonical ``iris`` dataset. First, import ``solara`` and ``plotly`` at the top of your script. ``pandas`` will not need to be imported separately. .. code:: sos import plotly.express as px # `import as` enables abbreviation import solara Next, declare any reactive and non-reactive variables. Here, we assign the ``iris`` dataset from ``plotly`` as a non-reactive variable named ``df``. .. code:: sos df = px.data.iris() columns = list(df.columns) If using Jupyter notebooks, run ``print(columns)`` to see the column variables inside the ``iris`` dataframe: ``sepal_length``, ``sepal_width``, ``petal_length``, ``petal_width``, ``species``, and ``species_id``. This line can be deleted or commented out before moving on with the rest of the example. .. figure:: ../../img/solara/solara-iris-columns.png :alt: A screenshot of print(columns) output A screenshot of print(columns) output Add Reactive Variables ---------------------- Next, define reactive variables. Reactive variables can be passed through Solara components for user interaction in the rendered app. In this example, the X- and Y-axis are configured by creating a global application state. These will be passed through a ``Select`` component to allow the user to control which columns are used for either axis. See more details on statement management `here `__. .. code:: sos x_axis = solara.reactive("sepal_length") y_axis = solara.reactive("sepal_width") Define the Main Page Component ------------------------------ Now, define the main ``Page()`` component. Reactive variables defined above can be passed through, and the component will “listen” for changes in each variable’s ``.value``. The component will then re-execute the defined function as with each value change. .. code:: sos @solara.component def Page(): # Scatter plot fig = px.scatter(df, x_axis.value, y_axis.value) solara.FigurePlotly(fig) # Pass x_axis and y_axis variables to the Select component # The select will control the individual reactive variables solara.Select(label="X-axis", value=x_axis, values=columns) solara.Select(label="Y-axis", value=y_axis, values=columns) display(Page()) # use to render app inside Jupyter notebook. Unnecessary if using .py script The rendered Solara app should display a scatter plot with interactive dropdown widgets to select each axis. .. figure:: ../../img/solara/solara-iris-demo-v1.gif :alt: A gif showing the iris scatter plot output A gif showing the iris scatter plot output Note: The ``Select`` components were placed below the ``solara.FigurePloty()`` component in the code and are therefore rendered below the plot. Enable Data Interaction ----------------------- Now that we’re able to display a simple scatter plot of our data, we want to add additional functionality to the app to improve our ability to interact with the data. One method to extract data from the scatter plot is to store selected data into a new reactive variable. Selected data is retrieved by ``on_click`` and stored into ``click_data``. This information can be displayed into a Markdown component inside the main ``Page()`` component. To do so, add the following line of code to the section where other reactive variables are defined. .. code:: sos click_data = solara.reactive(None) We also want to add an indicator (e.g., ⭐️) that highlights the data point clicked inside the scatter plot. Use an ``if`` statement to determine if a data point has been clicked and, if so, to add the indicator. Code is shown below: .. code:: sos if click_data.value is not None: x = click_data.value["points"]["xs"][0] y = click_data.value["points"]["ys"][0] # add click indicator fig.add_trace(px.scatter(x=[x], y=[y], text=["⭐️"]).data[0]) Place the previous line of code inside the ``Page()`` definition. The ``Page()`` component should now appear: .. code:: sos @solara.component def Page(): fig = px.scatter(df, x_axis.value, y_axis.value) solara.FigurePlotly(fig, on_click=click_data.set) # UI selection widgets solara.Select(label="X-axis", value=x_axis, values=columns) solara.Select(label="Y-axis", value=y_axis, values=columns) # store click data if click_data.value is not None: x = click_data.value["points"]["xs"][0] y = click_data.value["points"]["ys"][0] # add click indicator fig.add_trace(px.scatter(x=[x], y=[y], text=["⭐️"]).data[0]) Additionally, we want to define a function to show the closest neighbors of the data point clicked. Add the following code after the section where reactive variables are defined, but before ``Page()`` is defined. This function will return nearest ``n`` neighbors specified (i.e., see ``n=10`` in the topmost line below). .. code:: sos def find_nearest_neighbours(df, xcol, ycol, x, y, n=10): df = df.copy() df["distance"] = ((df[xcol] - x)**2 + (df[ycol] - y)**2)**0.5 return df.sort_values('distance')[1:n+1] To use this function, we can utilize the following code with ``n=3``. This will return the top 3 neighbors if a data point is clicked, or otherwise none. .. code:: sos if click_data.value is not None: df_nearest = find_nearest_neighbours(df, x_axis.value, y_axis.value, x, y, n=3) else: df_nearest = None The previous lines of code can be combined with the others inside the ``click_data.value``\ ’s ``if`` statement as shown below: .. code:: sos if click_data.value is not None: x = click_data.value["points"]["xs"][0] y = click_data.value["points"]["ys"][0] # add an indicator fig.add_trace(px.scatter(x=[x], y=[y], text=["⭐️"]).data[0]) # find nearest n=3 neighbors df_nearest = find_nearest_neighbours(df, x_axis.value, y_axis.value, x, y, n=3) else: df_nearest = None We also want to display these neighbors on the app. The following lines of code can be added at the end of the ``Page()`` component. .. code:: sos if df_nearest is not None: solara.Markdown("## Nearest 3 neighbours") solara.DataFrame(df_nearest) else: solara.Info("Click to select a point") Execute Combined Code --------------------- The app’s full code should now appear similarly to that shown below. Note the inclusion of two additional arguments inside ``px.scatter()``: ``color="species"`` to use ``species`` for coloring and ``custom_data=[df.index]`` to use index values to extract data for use in widgets. ``custom_data`` is not user-visible and is included in the figure’s events (e.g., data selection). Additional details can be found `here `__. .. code:: sos # import dependencies import plotly.express as px # `import as` enables abbreviation import solara # define global variables df = px.data.iris() columns = list(df.columns) # define reactive variables x_axis = solara.reactive("sepal_length") y_axis = solara.reactive("sepal_width") click_data = solara.reactive(None) # define function to find nearest neighboring data points def find_nearest_neighbours(df, xcol, ycol, x, y, n=10): df = df.copy() df["distance"] = ((df[xcol] - x)**2 + (df[ycol] - y)**2)**0.5 return df.sort_values('distance')[1:n+1] # define app's main component @solara.component def Page(): # add scatter plot using plotly express fig = px.scatter(df, x_axis.value, y_axis.value, color="species", custom_data=[df.index]) # store click data if click_data.value is not None: x = click_data.value["points"]["xs"][0] y = click_data.value["points"]["ys"][0] # add a star indicator upon clicking data point fig.add_trace(px.scatter(x=[x], y=[y], text=["⭐️"]).data[0]) # reactively obtain nearest neighbors upon clicking data point df_nearest = find_nearest_neighbours(df, x_axis.value, y_axis.value, x, y, n=3) else: df_nearest = None # plot figure solara.FigurePlotly(fig, on_click=click_data.set) # enable UI dropdown widget to select axis categories solara.Select(label="X-axis", value=x_axis, values=columns) solara.Select(label="Y-axis", value=y_axis, values=columns) # show dataframe of the clicked point's nearest n neighbors if df_nearest is not None: solara.Markdown("## Nearest 3 neighbours") solara.DataFrame(df_nearest) else: solara.Info("Click to select a point") # display app inside Jupyter notebook (not needed if using .py script) display(Page()) .. figure:: ../../img/solara/solara-demo-ds.gif :alt: A gif demoing the data science example A gif demoing the data science example