import pandas as pd
import plotly.graph_objects as go
Sankey Diagram in Python
A Sankey diagram is a type of flow diagram that visualizes the flow of resources or information between multiple entities. The key features of a Sankey plot include nodes representing the entities and links or flows connecting them, with the width of the links proportional to the quantity they represent.
We’ve covered this topic in R before. Please refer to the provided link for our previous discussion on the subject: http://www.sanaitics.com/research-paper.aspx?id=180
Getting Started
To create the Sankey plot, the Plotly library’s go.Sankey() function will be used. Ensure you have Plotly installed by executing the following command if you haven’t done so already.
pip install plotly
Importing Libraries
Importing Data
= pd.read_csv('C:/Users/SANKHYA/Downloads/Sankeydata.csv')
df df.head()
id | gender | field | personality | |
---|---|---|---|---|
0 | PT_1 | Female | Business | Introverted |
1 | PT_2 | Male | Law | Extroverted |
2 | PT_3 | Male | Science | Introverted |
3 | PT_4 | Female | Art | Introverted |
4 | PT_5 | Female | Business | Extroverted |
Breaking down the terminologies:
• Node: Nodes are your source and target points in a Sankey plot. They are represented by rectangles.
• Link: Links connect nodes, depicting the flow/transition of entities from source to target categories. Their thickness depends on the quantity or frequency shifting categories.
• Value: Values are the numerical values associated with links that indicate the frequency of entities moving from one category to another.
Transforming Data
= df.groupby(['personality', 'field']).size().reset_index(name='n')
freq_table freq_table
personality | field | n | |
---|---|---|---|
0 | Extroverted | Art | 6 |
1 | Extroverted | Business | 17 |
2 | Extroverted | Law | 9 |
3 | Extroverted | Science | 7 |
4 | Introverted | Art | 16 |
5 | Introverted | Business | 15 |
6 | Introverted | Law | 13 |
7 | Introverted | Science | 17 |
Creating Nodes and Links
# Create Nodes and Links data frames
= pd.DataFrame({'name': pd.unique(freq_table[['personality', 'field']].values.ravel('K'))})
nodes
= pd.DataFrame({
links 'source': [nodes[nodes['name'] == p].index[0] for p in freq_table['personality']],
'target': [nodes[nodes['name'] == f].index[0] for f in freq_table['field']],
'value': freq_table['n']
})
# Convert source and target columns to integers
'source'] = links['source'].astype(int)
links['target'] = links['target'].astype(int) links[
Creating a Sankey Plot using go.Sankey()
function
# Create Sankey Plot
= go.Figure(go.Sankey(
fig =dict(pad=15,
node=20,
thickness=dict(color='grey', width=0.5),
line=nodes['name']
label
),=dict(source=links['source'],
link=links['target'],
target=links['value']
value
)
))
# Set layout
='Sankey Plot: Personality and Field', font_size=10) fig.update_layout(title
# Show the plot
fig.show()
Understanding the Sankey Plot
It visually illustrates how introverted and extroverted personalities are distributed across various fields such as business, science, law, etc. By examining the connections between personality types and fields, we can glean insights into preferences. For instance, the chart suggests that individuals with introverted personalities tend to lean towards the field of art compared to those with extroverted personalities. This observation provides a valuable takeaway about the potential correlation between personality types and field preferences.
Conclusion:
Sankey diagrams serve as a revolutionary tool in the realm of data visualization, surpassing conventional limits to provide profound insights. Their adaptability, knack for revealing patterns, and effectiveness in conveying intricate information establish them as essential instruments for decision-makers in various fields. In the midst of navigating the continuously evolving data landscape, Sankey diagrams act as guiding lights, leading us towards a more lucid and intuitive comprehension of the complex relationships that define our world.