One-Stop Guide for Plotly and Dash Text Dataset Visualization Using Big Query and Flask: Good to Great

Jayce Jiang
13 min readJun 3, 2020

--

Background:

In the previous tutorial, we created an advanced data extraction pipeline from Airflow and discussed the different types of data engineering frameworks. If you are interested in Google Cloud Platform with Airflow, you can check out the first and second posts of this blog series.

In this tutorial, we will build a visualization with the Twitter data we harvested in the previous blog post. I will show you how to create an app leveraging python viz libraries Plotly and Dash, all within the Flask framework. Visuals will include frequency bar charts and Word Cloud plots. I will also provide an overview of some handy Dash library components, such as multi-callback, cross-filtering, and data sharing between callbacks.

Data Visualization Is More Important Than Ever

We are in the age of Big Data, a time when companies like Google and Facebook harvest petabytes of data from billions of individuals’ web searches and demands barked at IoT devices. Visualization skills have grown increasingly vital as the ability to interpret and analyze this unprecedented amount of incoming data has only become more challenging. Proper visualization aids analysts in making sense of the data, leading to better storytelling and ultimately more informed decision-making by you and your stakeholders.

Let’s dive in and build our story, starting with quick bullets on Plotly and Dash, the two python visualization libraries we will be using. I will also cover the advantage of Plotly/Dash versus other visualization tools such as Tableau and Qlik.

Plotly is a visualization library used to build many graphs, including 3D scatter plots, bar charts, radar charts, and much more.

Dash is a python library that compliments Plotly by providing interactive components within a Flask framework.

  • Plotly is built on D3.js; thus, it is more versatile than Tableau’s easy drag-and-drop graphical input. However, Tableau doesn’t require writing code, which can be more comfortable for non-coders. It can be difficult to make complex, multi-field input/output charts using Plotly without essential coding ability.
  • Plotly has more control and flexibility. However, off-the-shelf tools, such as Tableau and Qlik, are much better for prototyping and more reusable than Plotly. Using Plotly on a different project usually requires it to be entirely rewritten. It is a significant downside of Plotly. For example, if you are presenting a report in a meeting, your boss asks you some mid-meeting questions. This situation can be solved by implementing a dynamic filter with a few clicks using viz tools such as Tableau or Power BI.
  • Plotly library allows for statistical and AI analysis, including curve fitting, moving average, and modeling result integration. It is excellent for people who use python already.
  • Plotly is relatively easy to pick up, and the developers provide plenty of documentation with examples for each visual, but more advanced documentation is difficult to read.

Every tool has its pros and cons. I recommend readers follow up with additional research to determine the best tool for them. In my case, Plotly and Dash were the clear winners, as I am already coding transformations, statistical analysis, and AI models with python. Plus, I prefer the customizability in comparison to enterprise viz tools like Tableau.

Setting Up Plotly on BigQuery API on Flask

Let’s set up Flask boilerplate on our local computer from Github.

git clone https://github.com/realpython/flask-boilerplate.git
cd flask-boilerplate

Initialize and activate a virtual environment to isolate the environment and avoid dependency issues either using Virutalenv or Anaconda Prompt.

#virualenv
virtualenv --no-site-packages env
source env/bin/activate
#Anaconda Prompt
conda create --name myenv python=3.7
conda activate myenv

Before installing the dependencies, add the following lines to the requirements.txt file.

google-api-python-client==1.6.2
pandas==0.23.4
plotly==4.2.1
google-cloud-storage==1.28.1
google-cloud-bigquery==1.24.0
nltk==3.5
dash==1.12.0
dash-core-components==1.10.0
dash-html-components==1.0.3

Install the dependencies and run the Flask App. Go to http://localhost:5000

pip install -r requirements.txt
python app.py

Simple Graph Using Plotly:

After we had our Flask App setup, we can start using basic Plotly to implement interact graph using any data in BigQuery.

Let’s get started.

First, add the following block of code to app.py to access the BigQuery table-set. Be sure to use your project_id and credential_path. I created a folder call dashboard and saved my credential JSON file in it.

from google.cloud import bigquery
from google.oauth2 import service_account
project_id='trusty-charmer-276704' #remember to use your own project ID
credentials_path='./dashboard/gcp.json' #use your own bigquery credential json file
credentials = service_account.Credentials.from_service_account_file(credentials_path) if credentials_path else None
client = bigquery.Client(project=project_id, credentials=credentials)

Then modify the content in the home route controller inside our app.py

from plotlydash.dashboard import create_plot@app.route('/')
def home():
"""Landing page."""
table, plot = create_plot(client)
return render_template(
'pages/placeholder.home.html',
title='Plotly Dash Flask Tutorial',
description='Embed Plotly Dash into your Flask applications.',
template='home-template',
body="This is a homepage served with Flask.",
plot=plot,
table=table
)

Now we need to modify the placeholder.home.html to display the graph by adding these few lines.

<head lang="en">
<meta charset="UTF-8">
<title>My First Dashboard</title>
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
</head>
{% extends 'layouts/main.html' %}
{% block title %}Home{% endblock %}
{% block content %}
<div class="row">
<div class="col-md-12">
<div id="bargraph">
<script>
var graphs = {{plot | safe}};
Plotly.plot('bargraph',graphs,{});
</script>
</div>
</div>
</div>
<div class="row">
<div class="col-lg-12">
<div id="table">
<script>
var table = {{table | safe}};
Plotly.plot('table',table,{});
</script>
</div>
</div>
</div>

Then create dashboard.py in the dashboard folder created earlier.

Here is the result you should see in your browser.

Mid-Example: Moving On to Dash:

What if I tell you that you can have everything Plotly offers but without the need to create the relative UI elements such as dropdowns, slider, etc.

Dash saves the day! It allows you to build an interactive dashboard with ease. Dash is written on top of Flask; however, usually, you just run Dash as an independent data visualization apps. You can easily incorporate the Dash app with our Flask app. Here is the best way to do it, so you don’t need to run two or multiple instances of Flask.

Let registers our separate Dash app with our parent Flask app by adding these few lines in our app.py. It allows us to pass our Flask instance to create_board function as a parameter.

#dash
with app.app_context():
# Import Dash application
from dashboard.dashboard import create_dashboard
app = create_dashboard(app, client)

In the dashboard.py file, add a new function call create_dashboard() that allows us to pass our Flask Instance into Dash as a server. This will create a Dash instance using our Flask app’s core.

import dash
import dash_table
import dash_html_components as html
import dash_core_components as dcc
from .dashboard_layout import html_layout
def create_dashboard(server, client):
dash_app = dash.Dash(server=server,
routes_pathname_prefix='/dashboard/' )

Now let's create the same frequency graph as the Plotly example but in Dash. Here is a complete function.

from .dashboard_layout import html_layoutdef create_dashboard(server, client):
dash_app = dash.Dash(server=server,
routes_pathname_prefix='/dashboard/',
external_stylesheets=[
'../static/css/main.css'
]
)
# Prepare a DataFrame
query = """
SELECT *
FROM tweetScraper.tweet
"""
df = client.query(query).to_dataframe()
stop = stopwords.words('english')
#convert to lower
df['newTweet'] = df['tweet'].apply(lambda x: ' '.join([word for word in x.lower().split() if word not in (stop)]))
#Remove Punctuations
df['newTweetRemove'] = df['newTweet'].str.replace('[^\w\s]','')
#Count Words
countFreq = df.newTweetRemove.str.split(expand=True).stack().value_counts()[:100]
# Custom HTML layout
dash_app.index_string = html_layout
# Create Layout
dash_app.layout = html.Div(
children=[dcc.Graph(
id='frequency_word_bargraph',
figure={
'data': [
{
'x': countFreq.index,
'y': countFreq.values,
'name': 'first example in Dash',
'type': 'bar'
}
],
'layout': {
'title': 'Coronvirus Tweet Word Frequency.'
}
})
],
id='dash-container'
)
return dash_app.server

Here is a brief summary of the two main Dash components: dash-core_components and dash_html_components.

  • dash_core_components — described the higher-level elements that are interactive and generated with the React.js library — contains the graph, tabs, dropdown, and other input/output components.
  • dash_html_components — has a component for every HTML tags. It generates the HTML element in the application — <Div><H3> are all examples of these components.

In the above example, we use the dcc.Graph components and dhc.html.Div components. We use these two components to create the layout of our Dashboard.

Let’s create a custom template file call dashboard_layout.py. It is the main.html file inside the templates folder but stores inside a variable call html_layout. For the full file, use this link.

html_layout = '''
<!DOCTYPE html>
<html>
<head>
{%metas%}
<title>{%title%}</title>
{%favicon%}
{%css%}
........
</head>
............
{%app_entry%}</div><footer>
{%config%}
{%scripts%}
{%renderer%}
... </footer>
</body>
</html>
'''

Now, finally, let’s add custom CSS inside the main.css inside of the static/css/ folder. This step is how to include custom CSS inside of the Dash Html Component.

/* need to overwrite css in dash-container and table so bootstrap works*/
#dash-container{
margin-top: 50px;
}
.dash-table-container .row {
margin: 0;
}

If you remember, we have a table showing our dataframe in our first example in Plotly. Let’s create a similar table inside Dash.

def generate_table(df):
"""Create Dash datatable from Pandas DataFrame."""

table = dash_table.DataTable(
id='database-table',
columns=[{"name": i, "id": i} for i in df.columns],
data=df.iloc[:30].to_dict('records'),
sort_action="native",
sort_mode='native',
row_selectable="multi",
style_table={
'maxHeight': '50ex',
'overflowY': 'scroll',
'width': '100%',
'minwidth': '100%',
'display': 'block'
},
style_header={
'fontWeight': 'bold',
'border': 'thin lightgrey solid',
'backgroundColor': 'rgb(100, 100, 100)',
'color': 'white'
},
style_cell={
'fontFamily': 'Open Sans',
'textAlign': 'left',
'width': '100px',
'minWidth': '10px',
'maxWidth': '180px',
'whiteSpace': 'normal',
'backgroundColor': 'Rgb(230,230,250)'
}
)
return table

Now we need to use bootstrap to format our Dashboard, here is a new create_plot function.

def create_dashboard(server, client):"""Create a Plotly Dash dashboard."""
dash_app = dash.Dash(server=server,
routes_pathname_prefix='/dashboard/',
external_stylesheets=[
'../static/css/main.css',
]
)
# Prepare a DataFrame
query = """
SELECT *
FROM tweetScraper.tweet
"""
df = client.query(query).to_dataframe()
stop = stopwords.words('english')
#convert to lower
df['newTweet'] = df['tweet'].apply(lambda x: ' '.join([word for word in x.lower().split() if word not in (stop)]))
#Remove Punctuations
df['newTweetRemove'] = df['newTweet'].str.replace('[^\w\s]','')
#Count Words
countFreq = df.newTweetRemove.str.split(expand=True).stack().value_counts()[:100]
# Custom HTML layout
dash_app.index_string = html_layout
# Create Layout
dash_app.layout = html.Div([
html.Div([
dcc.Graph(
id='frequency_word_bargraph',
figure={
'data': [
{
'x': countFreq.index,
'y': countFreq.values,
'name': 'first example in Dash',
'type': 'bar'
}
],
'layout': {
'title': 'Coronvirus Tweet Word Frequency.'
}
},
style={
'display': 'block'
},
className='twelve columns'
)],
className='row')
,
html.Div([
html.Div(className='col-sm-0 col-md-1 col-lg-1'),
html.Div([generate_table(df)], className='col-sm-12 col-md-10 col-lg-10'),
html.Div(className='col-sm-0 col-md-1 col-lg-1')], className='row')],
id='dash-container'
)
return dash_app.server

Now you should see the following in the localhost server.

Advance Example: Dash Advanced Topic

Let’s make dynamic graphs with a basic dash callback function.

Basic Dash Callbacks:

Callbacks are how you make your Dash apps even more interactive by allowing you to utilize inputs to modify Dashboard’s graph output.

I will show you a simple example by using a slider that would allow you to dynamically change the frequency bar graph’s output in the above example.

Since we use Flask in conjunction with Dash, we need to modify our Dash file structure. It is relatively simple, first, initialize the callbacks inside the create_board function and, inside the init_callbacks function, create your callback as normal. Here are the templates.

import dash
from dash.dependencies import Input, Output
import dash_table
import dash_html_components as html

def create_dashboard(server):
app = dash.Dash(__name__)
app.layout = html.Div([
# ... Layout stuff
])

# Initialize callbacks after our app is loaded
# Pass dash_app as a parameter
init_callbacks(app)

return app.server

def init_callbacks(app):
@app.callback(
# ... Callback input/output
)
def update_graph():
# ... Insert callback stuff here

Now you know the template, let’s get started with the Slider example. We need to add this block of code inside the init_callbacks function:

def init_callbacks(app, countFreq):
@app.callback(
dash.dependencies.Output('frequency_word_bargraph', 'figure'),
[dash.dependencies.Input('range_frequency_number', 'value')])
def update_graph(value):
newGraph = countFreq[value[0]:value[1]]return {
'data': [
{
'x': newGraph.index,
'y': newGraph.values,
'name': 'first example in Dash',
'type': 'bar'
}
],
'layout': {
'title': 'Coronvirus Tweet Word Frequency.'
}
}

Then we need to change thedash_app.layout to include the RangeSlider:

# Create Layout
dash_app.layout = html.Div([
html.Div([
dcc.Graph(id='frequency_word_bargraph', style={'display': 'block'}),
dcc.RangeSlider(
id='range_frequency_number',
min=0,
max=100,
step=1,
value=[0, 100],
marks={
0: {'label': 'Top(min)', 'style': {'color': '#77b0b1'}},
50: {'label': 'Top(50)'},
100: {'label': 'Top(Max)', 'style': {'color': '#77b0b1'}}
},
allowCross=False
)],
className='row')
,
html.Div([
html.Div(className='col-sm-0 col-md-1 col-lg-1'),
html.Div([generate_table(df)], className='col-sm-12 col-md-10 col-lg-10'),
html.Div(className='col-sm-0 col-md-1 col-lg-1')], className='row'),
# Hidden div inside the app that stores the intermediate value
html.Div(id='intermediate-value', style={'display': 'none'})],
id='dash-container'
)
init_callbacks(dash_app, countFreq)

Sharing Data Between Callback

As you seem in my init_callbacks function, you can pass variables and data between different callbacks using three methods.

  • Storing Data in the Browser with a Hidden Div — data is converted to a string like JSON-there would be network transport cost but does not increase the memory footprint of the app.
  • Computing Aggregations Upfront — Better than storing the entire data in browsers, first compute the aggregations in servers then send the aggregation to the client-side — You don’t need the entire dataframe, your app will usually only display a subset or an aggregation of the computed/filtered data
  • Caching and Signaling — three different types
  1. Lru_Cache
  2. Flask_Caching: Local System or Redis
  3. User-Based Client Caching

I did the second method: computing aggregation upfronts and pass it down into our callbacks function. For other method implementation, you can refer to this documentation.

Once you added these two blocks of code, your dashboard should now be dynamically change depending on the range slider like the following video.

WordCloud Example — Advanced Dash Callbacks and Error Catching:

Let’s create an interesting graph in our application, how about a WordCloud plot using our dataset.
First, we need to WordCloud library, so add the following in the requirements.txt

#requirements.txt
wordcloud
pip install -r requirements.txt

The WordCloud library utilizes matplotlib, and Plotly currently does not support this type of figure. Thus, we need to convert it into a PNG image. Here are the new dash_app layout and callbacks functions.

from wordcloud import WordCloud
# Display the generated image:
# the matplotlib way:
import matplotlib.pyplot as plt
import io
import base64
...
# Create Layout
dash_app.layout = html.Div([
html.Div([
html.Div(
[html.Img(id= 'matplotlib-graph', className="img-responsive", style={'max-height': '520px', 'margin': '0 auto'})], className='col-sm-12 col-md-6 col-lg-6'),
html.Div(
[dcc.Graph(id='frequency_word_bargraph', style={'display': 'block'}),
dcc.RangeSlider(
id='range_frequency_number',
min=0,
max=100,
step=1,
value=[0, 100],
marks={
0: {'label': 'Top(min)', 'style': {'color': '#77b0b1'}},
50: {'label': 'Top(50)'},
100: {'label': 'Top(Max)', 'style': {'color': '#77b0b1'}}
},
allowCross=False)], className='col-sm-12 col-md-6 col-lg-6')
], className='row')
,
html.Div([
html.Div(className='col-sm-0 col-md-1 col-lg-1'),
html.Div([generate_table(df)], className='col-sm-12 col-md-10 col-lg-10'),
html.Div(className='col-sm-0 col-md-1 col-lg-1')], className='row'),
# Hidden div inside the app that stores the intermediate value
html.Div(id='intermediate-value', style={'display': 'none'})],
id='dash-container'
)
...

Multi-outputs Callbacks

It is as simple as including multiple Output in our callbacks function.

Now you should see the WordCloud generate based on the range slider as well. Here is what it looks like. It is a bit slow since it requires a brand new creation of an object every time.

Multi-Inputs Callbacks: Determine which Input has fired

You can do multiple inputs in a callback easily. Simply added your desire inputs in the callback function as an array.

@app.callback(Output('container', 'children'),
[Input('btn-1', 'n_clicks'),
Input('btn-2', 'n_clicks'),
Input('btn-3', 'n_clicks')])

You can also determine which input has been fire using dash internal callback context like this.

def example():
ctx = dash.callback_context
if not ctx.triggered:
button_id = 'No clicks yet'
else:
button_id = ctx.triggered[0]['prop_id'].split('.')[0]
#ctx.triggered is a object containing prop_id and value
#ctx.states is the state of the input
#ctx.inputs is all of the inputs' values

Memoization from Cache: Lru_cache

Since Dash’s callbacks are functional and do not contain any state, you can quickly improve performance with memorization by allowing you to bypass long computations by storing results of the function calls. You can use memoization using the functools32 built-in library.

import functools32@functools32.lru_cache(maxsize=32)
def slow_function(input):
#long computation like the image generation below
return 'Input was {}'.format(input)

Calling slow_function for the first time takes a considerable amount of time, depending on the computation. However, call the same function with the same input again is instant, since the previously computed results are saved in memory.

However, the limitation of lru_cache is that each process/thread contains its own memory, and doesn’t share memory across different instances. There is a better solution: the Flask-Caching library.

Advance Memoization: Shared Memory Database

We can use the Flask-Caching library that saves results in a shared memory database such as Redis or our local filesystem. This library also has some excellent features like expiration token, cache reset, and many others. Here is a quick template of how to use it with our application.

from flask_caching import Cachedef create_dashboard(server, client):"""Create a Plotly Dash dashboard."""
dash_app = dash.Dash(server=server,
routes_pathname_prefix='/dashboard/',
external_stylesheets=[
'../static/css/main.css'
]
)
cache = Cache(dash_app.server, config={
'CACHE_TYPE': 'filesystem',
'CACHE_DIR': 'cache-directory'
})
dash_app.config.suppress_callback_exceptions = True
.....
code
....
....
inside callback
....
def init_callbacks(app, countFreq):
@app.callback(
[dash.dependencies.Output('frequency_word_bargraph', 'figure'),
dash.dependencies.Output('matplotlib-graph', 'src')],
[dash.dependencies.Input('range_frequency_number', 'value')])
#add this new line
@cache.memoize(timeout=timeout) # in seconds
def update_graph(value):

WebGL: GPU Computation

You can also use the WebGL to render our Plotly charts. WebGL uses the GPU to render graphics and provide a huge performance gain for dataset over 15k of points. Here is a simple example provided by the Plotly:

import plotly.graph_objects as goimport numpy as npfig = go.Figure()trace_num = 10
point_num = 5000
for i in range(trace_num):
fig.add_trace(
go.Scattergl(
x = np.linspace(0, 1, point_num),
y = np.random.randn(point_num)+(i*5)
)
)
fig.update_layout(showlegend=False)fig.show()
Source:

Deployment: Heroku (Optional)

With the template we download, you can easily deploy our application in Heroku.

  1. Create a free account in Heroku and install the Heroku Command Line Interface
  2. Use heroku login
  3. Create a local Git repository
$ git init
$ git add .
$ git commit -m "initial files"

4. Create and deploy your app on Heroku. Then open your app in Heroku.

heroku create <name_it_if_you_want>
git push heroku master
heroku open

Conclusion:

With this tutorial, you should able to build and deploy a complete dashboard app. You can do a lot more with Dash than shown here; you can have multiple tabs, 3D map graphs, and graph to image downloader. I recommend taking on a self-interest project with Dash or incorporate Dash into an already existing Flask App.

Some last-minute tips before you create your Dashboard App:

  • Make sure you understand what story you are trying to tell with the dashboard — keep it simple and focus on the font, color, and label.
  • Download boilerplate from GitHub, no need to reinvent the wheel.
  • Add dash components and callbacks one by one, then combine them if you see redundant data or code.
  • Modularize your code with functions and individual py files. It makes debugging more straightforward.
  • Test and deploy your app at each significant step to ensure the app is functioning purpose in production.

Our third tutorial ends here! Finally, I would like to thank my friend, Danny, for helping me with this article.

If you like this article, please like, share, and subscribe for more content. Thanks again.

If you want to check out a full production dash app, you visit my SBTN’s Player Comparison Data Visualization Tool and SBTN’s NBA Player Cluster Analysis.

--

--

Jayce Jiang
Jayce Jiang

Written by Jayce Jiang

Data Engineer at Disney, who previous work at Bytedance.

No responses yet