Sankey charts are a type of diagram that can be used to visualize complex data. Created by merging the flowchart and the bubble chart, it works well when representing connections between multiple data sources. As the name suggests, the main flow direction of data comes from a top to the bottom by going clockwise.
Before diving into the process of creating a Sankey chart, one must first define its structure. A Sankey diagram usually consists of five elements: the source node, each connected node, a target node, and the weight of the connections.
Creating and applying Sankey charts in R
Sankey Chart (R) is a powerful tool in R. It allows you to create Sankey charts easily. The main functions used in Sankey charts in R are gSankey
in plyr
and geom_sankey
, geom_connected_circle
.
Here’s an example of a Sankey chart created in R:
“` R
library(plyr)
Create a data frame
df_sankey <- data.frame(
source = c(“A”, “A”, “A”, “B”, “B”),
target = c(“B”, “B”, “A”, “C”, “C”),
weight = c(4, 3, 2, 1, 1)
)
Create a Sankey chart
gsankey <- gSankey(dfsankey)
“`
geom_sankey
creates a Sankey chart, g_sankey
is the output of gSankey
, the data you inserted into the function, and the size of the node represents the source and target node.
The source
is the location where data flows to, target
is where it flows, and weight
is the strength of the flow. As mentioned above, the direction of data flow is from top to bottom.
Sankey Chart (R) in Python
In Python, pandas’ read_csv
, plot
, and geopandas
can be used to create Sankey charts. For example:
“` python
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
Load the data
dfsankey = pd.readcsv(‘sankey_data.csv’)
Use geopandas to load and transform data
dfsankey[‘X’] = dfsankey[‘geometry’].x
dfsankey[‘Y’] = dfsankey[‘geometry’].y
Use geopandas’ MultiPolygon to plot data
gdf = gpd.GeoDataFrame(dfsankey, geometry=gpd.pointsfromxy(dfsankey[‘X’], df_sankey[‘Y’]))
gdf.plot(color=’blue’)
Use matplotlib.pyplot to plot the Sankey chart
gsankey = gpd.sankey(dfsankey)
g_sankey.plot()
“`
Here, g_sankey
is returned from gpd.sankey
, showing where the data flows in the direction of flow.
Sankey Chart in Julia
In Julia, we can use DataFrames
and ggraph
. For instance:
“` julia
import DataFrames, GGraph
Load the data
src = load(“@my-source-data@/data.csv”)
tgt = load(“@my-target-data@/data.csv”)
Create a ggraph object
g = GGraph(newgraph(true))
Create a Sankey graph
g.sankey(
src = src
.index,
tgt = tgt
.index,
weight = tgt
.value
)
Add links to a graph
for i in colindex(src)
for j in colindex(tgt)
g.add(s = i, t = j, w = src[i]
end
end
end
end
“`
In this example, the sankey
method from DataFrames.Sankey
adds the connections between the source and target based on the weights.
In conclusion, Sankey diagrams are an excellent choice for visualizing complex data. They facilitate an understanding of connections and flows through various categories, thus making it easier for the user to interpret the complex data. By using R, Python, and Julia, we can easily generate Sankey charts based on our data, helping us learn in a simpler and clear way.
SankeyMaster
SankeyMaster is your go-to tool for creating complex Sankey charts . Easily enter data and create Sankey charts that accurately reveal intricate data relationships.