Sankey Diagram in Python

A Sankey diagram is a type of flow diagram that visualizes the flow of resources or information between multiple entities. The key features of a Sankey plot include nodes representing the entities and links or flows connecting them, with the width of the links proportional to the quantity they represent.

We’ve covered this topic in R before. Please refer to the provided link for our previous discussion on the subject: http://www.sanaitics.com/research-paper.aspx?id=180

Getting Started

To create the Sankey plot, the Plotly library’s go.Sankey() function will be used. Ensure you have Plotly installed by executing the following command if you haven’t done so already.

pip install plotly

Importing Libraries

import pandas as pd
import plotly.graph_objects as go

Importing Data

df = pd.read_csv('C:/Users/SANKHYA/Downloads/Sankeydata.csv')
df.head()
id gender field personality
0 PT_1 Female Business Introverted
1 PT_2 Male Law Extroverted
2 PT_3 Male Science Introverted
3 PT_4 Female Art Introverted
4 PT_5 Female Business Extroverted

Breaking down the terminologies:

• Node: Nodes are your source and target points in a Sankey plot. They are represented by rectangles.

• Link: Links connect nodes, depicting the flow/transition of entities from source to target categories. Their thickness depends on the quantity or frequency shifting categories.

• Value: Values are the numerical values associated with links that indicate the frequency of entities moving from one category to another.

Transforming Data

freq_table = df.groupby(['personality', 'field']).size().reset_index(name='n')
freq_table
personality field n
0 Extroverted Art 6
1 Extroverted Business 17
2 Extroverted Law 9
3 Extroverted Science 7
4 Introverted Art 16
5 Introverted Business 15
6 Introverted Law 13
7 Introverted Science 17

Creating a Sankey Plot using go.Sankey() function

# Create Sankey Plot
fig = go.Figure(go.Sankey(
    node=dict(pad=15,
        thickness=20,
        line=dict(color='grey', width=0.5),
        label=nodes['name']
    ),
    link=dict(source=links['source'],
        target=links['target'],
        value=links['value']
    )
))

# Set layout
fig.update_layout(title='Sankey Plot: Personality and Field', font_size=10)
# Show the plot
fig.show()

Understanding the Sankey Plot

It visually illustrates how introverted and extroverted personalities are distributed across various fields such as business, science, law, etc. By examining the connections between personality types and fields, we can glean insights into preferences. For instance, the chart suggests that individuals with introverted personalities tend to lean towards the field of art compared to those with extroverted personalities. This observation provides a valuable takeaway about the potential correlation between personality types and field preferences.

Conclusion:

Sankey diagrams serve as a revolutionary tool in the realm of data visualization, surpassing conventional limits to provide profound insights. Their adaptability, knack for revealing patterns, and effectiveness in conveying intricate information establish them as essential instruments for decision-makers in various fields. In the midst of navigating the continuously evolving data landscape, Sankey diagrams act as guiding lights, leading us towards a more lucid and intuitive comprehension of the complex relationships that define our world.