From Data to Insight: Visualizing Quantities, Proportions, Relationships, and Distributions with Python’s Matplotlib and Seaborn

Lukman Aliyu
5 min readMay 13, 2023

--

Data visualization is a crucial aspect of data analysis as it aids in our ability to comprehend and convey to a larger audience complicated data correlations, patterns, and insights. Owing to its potent modules, such as Matplotlib and Seaborn, Python has gained popularity as a language for data visualization. In this post, we’ll look at how to visualize quantities, proportions, relationships, and distributions using these libraries and how to get useful insights out of them.

Visualizing Quantities

Any information that is measurable and can be expressed numerically is considered quantitative. Understanding the distribution and dispersion of values, the existence of outliers, and the link between various variables is made easier by visualizing quantitative data.

Histograms are one of the most commonly used tools for visualizing quantitative data. A histogram is a graph that displays the frequency distribution of a collection of continuous data. Histograms can be easily made with only a few lines of code using Matplotlib and Seaborn. First, let's import the required libraries before we look at an illustration:

# import required libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = np.random.normal(size=1000)
# Create a histogram using Matplotlib
plt.hist(data, bins=30)
plt.title('Matplotlib Histogram of Data')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.show()
Fig.1: Histogram produced using Matplotlib
# Create a histogram using Seaborn
sns.histplot(data, bins=30)
plt.title('Seaborn Histogram of Data')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.show()
Fig.2: Histogram produced using Seaborn

Using NumPy’s random.normal function, we produce some random data in this example. Then, we use Matplotlib and Seaborn to produce two histograms, one using each package. While the histograms generated by the two libraries are comparable, Seaborn’s histplot function includes a few more capabilities, such as the ability to plot a KDE (Kernel Density Estimate) in addition to the histogram.

Proportions

Any information that may be stated as a percentage or a fraction of a whole is considered proportional data. Understanding the relative sizes of several categories or groups and how they affect the overall picture is made easier with the aid of proportional data visualization. Using a pie chart, donut chart, or waffle chart, you can see proportions.

Pie charts are one of the most commonly used tools for representing proportional data. A circular graph called a pie chart demonstrates the proportional breakdown of a group of categorical data. Both Matplotlib and Seaborn provide pie chart creation routines. Let’s look at an illustration:

data = [25, 40, 10, 25]
labels = ['A', 'B', 'C', 'D']
# Create a pie chart using Matplotlib
plt.pie(data, labels=labels)
plt.title('Matplotlib Proportional Data')
plt.show()
Fig.3: Matplotlib Pie Chart
# Create a pie chart using Seaborn
sns.set_style('whitegrid')
plt.title('Seaborn Proportional Data')
sns.color_palette('pastel')
plt.pie(data, labels=labels, colors=sns.color_palette())
plt.show()
Fig.4: Seaborn Pie Chart

In this illustration, we generate a set of labels and random data. After that, we use Seaborn and Matplotlib, respectively, to construct two pie charts. We can color-code each category using a specified palette thanks to Seaborn’s color_palette function.

Visualizing Relationships

Any data that demonstrates the connection between two or more variables is referred to as relationship data. Visualizing connection data enables us to comprehend the correlation, directionality, and linear or nonlinear nature of the relationship.

Scatter plots are among the most commonly used tools for displaying relationship data. An illustration of the relationship between two sets of data is a scatter plot. Both Matplotlib and Seaborn include tools for making scatter plots. Let’s look at an illustration:

x = np.random.normal(size=1000)
y = 2 * x + np.random.normal(size=1000)

# Create a scatter plot using Matplotlib
plt.scatter(x, y)
plt.title('Matplotlib Scatter Plot of Data')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
Fig.5: Matplotlib Scatterplot
# Create a scatter plot using Seaborn
sns.scatterplot(x=x, y=y)
plt.title('Seaborn Scatter Plot of Data')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
Fig.6: Seaborn Scatterplot

In this illustration, we generate the random data sets x and y. Then, we use Matplotlib and Seaborn to construct two scatter plots, one using each package. The scatterplot function in Seaborn contains a few extra capabilities, like the ability to color-code points according to a third variable.

Visualizing Distributions

Any data that demonstrates the frequency of occurrence of various values or ranges of values is referred to as distribution data. Understanding the distribution’s shape, the existence of outliers, and the likelihood of specific values or ranges of values are all made easier by visualizing distribution data.

Density plots are one of the most popular tools for displaying distribution data. A graph that displays the probability density function of a collection of data is called a density plot. Both Matplotlib and Seaborn include tools for making density graphs. Let’s look at an illustration:

data = np.random.normal(size=1000)
# Create a density plot using Matplotlib
plt.hist(data, density=True, alpha=0.5, bins=30)
sns.kdeplot(data, color='red')
plt.title('Matplotlib Density Plot of Data')
plt.xlabel('Values')
plt.ylabel('Probability Density')
plt.show()
Fig.7: Matplotlib Density Plot
# Create a density plot using Seaborn
sns.histplot(data, kde=True, stat='density', alpha=0.5, bins=30)
plt.title('Seaborn Density Plot of Random Data')
plt.xlabel('Values')
plt.ylabel('Probability Density')
plt.show()
Fig.8: Seaborn Density Plot

Using NumPy’s random.normal function, we generate a set of random data in this example. Then, we use Matplotlib and Seaborn to produce two density charts, one using each package. We may combine a histogram and a density plot into one visualization using Seaborn’s histplot function.

Conclusion

In this article, we looked at how to depict quantities, proportions, relationships, and distributions using Matplotlib and Seaborn. Data visualization is a potent tool for comprehending and expressing complicated data relationships, trends, and insights. We can get valuable insights from our data using these libraries, and we can use those insights to make well-informed decisions.

--

--

Lukman Aliyu
Lukman Aliyu

Written by Lukman Aliyu

Pharmacist enthusiastic about Data Science/AI/ML| Fellow, Arewa Data Science Academy

No responses yet