Python Data Visualization: Exploring Data with Pandas, Matplotlib, and Seaborn

Lukman Aliyu
5 min readMar 15, 2023
Photo by Markus Winkler on Unsplash

Data visualization is a crucial aspect of data analysis as it helps in understanding data better. Python provides several libraries for data visualization, and in this article, we will explore three of the most commonly used libraries for data visualization in Python: pandas, matplotlib, and seaborn.

Pandas is a powerful library for data manipulation and analysis. It provides data structures for efficiently storing and manipulating large datasets. One of the primary data structures provided by pandas is the DataFrame, which is a two-dimensional table-like structure with labeled axes. The pandas library also provides a number of methods for data manipulation and cleaning, such as data cleaning, data merging, and data transformation.

Matplotlib is a plotting library for Python that provides a range of 2D plotting capabilities. It provides a high level of customization, making it possible to create complex visualizations with ease. Matplotlib also provides support for a wide range of output formats, including PDF, PNG, and SVG.

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for creating attractive and informative statistical graphics. Seaborn makes it easy to create complex visualizations by providing a range of built-in styles and color palettes.

Let’s start by exploring the pandas library.

Data visualization with pandas

Pandas provides a number of methods for data visualization. One of the most commonly used methods is the plot method, which can be used to create a wide range of visualizations, including line plots, scatter plots, bar plots, and histograms.

To demonstrate the use of pandas for data visualization, let’s start by loading a sample dataset. We will use the famous Iris dataset, which contains information about the lengths and widths of petals and sepals for three different species of Iris flowers. I first came across this dataset when I first started learning R’s tidyverse.

We can use the plot method to create a scatter plot of the petal length and width for the Iris dataset:

Plot 1: Scatterplot of iris petal_length vs petal width using pandas

As can be seen from the above, pandas makes it easy to create a scatter plot with just a single line of code. We can also customize the plot by adding labels and a title:

Plot 2: Adding custom labels to the pandas scatterplot

In addition to scatter plots, pandas can also be used to create line plots, bar plots, and histograms. Let’s create a bar plot of the mean petal length for each of the three species of Iris:

Plot 3: Bar plots of mean petal lengths per species using pandas

Data visualization with matplotlib

While pandas provides a number of convenient methods for data visualization, matplotlib provides a more low-level interface for creating visualizations. This makes it possible to create highly customized visualizations, but it requires a bit more effort. Let’s see how we can use matplotlib to create a scatter plot of the petal length and width for the Iris dataset.

Plot 4: Scatterplot of petal length vs width using matplotlib

From the above, it can be seen that creating a scatter plot with matplotlib requires a bit more code than with pandas, but it provides a high level of customization. We can also create a bar plot of the mean petal length for each of the three species of Iris:

Plot 5: Customised bar and line plot for mean petal length per species using matplotlib

From the above plot, matplotlib provides a lot of flexibility for creating customized visualizations.

Data visualization with seaborn

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for creating attractive and informative statistical graphics. Seaborn makes it easy to create complex visualizations by providing a range of built-in styles and color palettes.

To create a scatter plot of the petal length and width for the Iris dataset with seaborn, we can use the scatterplot function:

Plot 6: Scatterplot of iris petal length vs width using seaborn

From the above plot, seaborn provides a clean and attractive visualization with just a single line of code.

We can also create a line plot of the mean petal length for each of the three species of Iris with the lineplot function:

Plot 7: Lineplot for mean petal length for per specie using seaborn

From the above plot, seaborn provides a more polished and informative visualization than matplotlib with very little code.

Seaborn also provides many other types of visualizations, including heatmaps, histograms, and violin plots, that can be used to explore relationships in your data.

Conclusion

In this article, we have seen how to use Python libraries such as pandas, matplotlib, and seaborn for data visualization. Data visualization is an essential part of the data analysis process because it allows us to explore and understand our data more effectively.

Pandas provides a convenient interface for creating basic visualizations such as scatter plots and bar charts. Matplotlib provides a high level of customization and control over the appearance of your visualizations, but requires more code to create them. Seaborn provides a high-level interface for creating attractive and informative statistical graphics with minimal code.

As with any tool, it is important to choose the right visualization library for your specific needs. If you need to create simple visualizations quickly and easily, pandas may be the best choice. If you need a high level of control over the appearance of your visualizations, matplotlib may be the better option. If you want to create complex statistical graphics quickly and easily, seaborn may be the way to go. Seaborn is my current favourite. In any case, with the power of Python and these visualization libraries, you can create informative and attractive visualizations that help you better understand your data.

--

--

Lukman Aliyu

Pharmacist enthusiastic about Data Science/AI/ML| Fellow, Arewa Data Science Academy