top of page

Learn through our Blogs, Get Expert Help, Mentorship & Freelance Support!

Welcome to Colabcodes, where innovation drives technology forward. Explore the latest trends, practical programming tutorials, and in-depth insights across software development, AI, ML, NLP and more. Connect with our experienced freelancers and mentors for personalised guidance and support tailored to your needs.

Coding expert help blog - colabcodes

Descriptive Analytics in Python: Central Tendency, Dispersion, Charts & Clustering

  • Writer: Samul Black
    Samul Black
  • Jan 7, 2024
  • 8 min read

Updated: Jul 23

Descriptive analytics helps you make sense of raw data by summarizing it through statistics and visualizations. In this guide, you'll learn key concepts like central tendency, dispersion, and explore various charts and clustering techniques—all with hands-on Descriptive Analytics in Python examples to help you analyze and present data clearly.

Descriptive Analytics - colabcodes

What is Descriptive Analytics?

Descriptive analytics is the foundational stage of data analytics that involves examining historical data to gain insights and understand patterns, trends, and relationships within the dataset. It focuses on summarizing and visualizing data to describe what has happened in the past and provides context for further analysis. This initial phase forms the basis for more advanced analytics, including predictive and prescriptive analytics. Descriptive analytics forms the groundwork for subsequent stages of data analysis. By providing a comprehensive understanding of historical data, it helps organizations make informed decisions, identify opportunities for improvement, and lay the foundation for more advanced analytics techniques.


Key Aspects of Descriptive Analytics:

Descriptive analytics revolves around transforming raw data into meaningful summaries that offer insight into past performance. It plays a crucial role in helping organizations track key metrics, measure progress, and make sense of historical patterns. Below are some of the most essential aspects that define descriptive analytics and make it a cornerstone of data-driven decision-making:


Data Collection and Cleaning:

Descriptive analytics begins with data collection from various sources, followed by data cleaning to ensure accuracy and consistency. It involves handling missing values, outliers, and formatting issues.


Data Summarization and Aggregation:

Summarizing data using statistical measures like mean, median, mode, and standard deviation helps in understanding the central tendencies and distributions within the dataset. Aggregating data into categories or groups provides a high-level overview.


Visualization Techniques:

Visualization tools such as charts, graphs, histograms, and heatmaps help in presenting data visually. This aids in identifying trends, patterns, and outliers more intuitively.


Exploratory Data Analysis (EDA):

EDA techniques, like scatter plots, box plots, and correlation matrices, facilitate in-depth exploration of relationships between variables, uncovering insights that can guide further analysis.


Techniques and Methods in Descriptive Analytics with examples in Python

Descriptive analytics encompasses various techniques and methods to summarize, visualize, and understand historical data. Here's a detailed account of the techniques involved in descriptive analytics:


Measures of Central Tendency:

Measures of Central Tendency are statistical tools that summarize a dataset with a single representative value indicating the center or typical value of the distribution.


  1. Mean: Mean is the average of the given numbers and is calculated by dividing the sum of given numbers by the total number of numbers.

  2. Median: The median is the middle value in a set of data. First, organize and order the data from smallest to largest. To find the midpoint value, divide the number of observations by two. If there are an odd number of observations, round that number up, and the value in that position is the median.

  3. Mode: A mode is defined as the value that has a higher frequency in a given set of values.


Python example demonstrating how to calculate Mean, Median, and Mode—the three fundamental measures of central tendency.

import statistics

data = [10, 15, 10, 20, 25, 30, 10, 20, 25]

print("Mean:", statistics.mean(data))
print("Median:", statistics.median(data))
print("Mode:", statistics.mode(data))  # raises exception if multimodal

Output:
Mean: 18.333333333333332
Median: 20
Mode: 10

Measures of Dispersion:

It describe the spread or variability of data points around a central value, helping to understand how much the data deviates from the average.


  1. Standard Deviation: Standard deviation is a statistic that measures the dispersion of a dataset relative to its mean and is calculated as the square root of the variance.

  2. Variance: The term variance refers to a statistical measurement of the spread between numbers in a data set. More specifically, variance measures how far each number in the set is from the mean (average), and thus from every other number in the set.

  3. Range: The range in statistics for a given data set is the difference between the highest and lowest values.


Here’s a Python-based breakdown of three important Measures of Dispersion: Range, Variance, and Standard Deviation.

import statistics

# Sample data
data = [10, 15, 10, 20, 25, 30, 10, 20, 25]

# Range
range_value = max(data) - min(data)

# Mean
mean = sum(data) / len(data)

# Variance (manual)
squared_diffs = [(x - mean) ** 2 for x in data]
variance_manual = sum(squared_diffs) / len(data)  # population variance

# Variance using statistics module (sample variance)
variance_stats = statistics.variance(data)
population_variance = statistics.pvariance(data)

# Standard Deviation (manual)
std_dev_manual = variance_manual ** 0.5

# Standard Deviation using statistics module
std_dev_stats = statistics.stdev(data)  # sample std dev
population_std_dev = statistics.pstdev(data)

# Results
print("Range:", range_value)
print("Variance (Manual, Population):", variance_manual)
print("Variance (statistics, Sample):", variance_stats)
print("Standard Deviation (Manual, Population):", std_dev_manual)
print("Standard Deviation (statistics, Sample):", std_dev_stats)

Output:

Range: 20
Variance (Manual, Population): 49.99999999999999
Variance (statistics, Sample): 56.25
Standard Deviation (Manual, Population): 7.071067811865475
Standard Deviation (statistics, Sample): 7.5

Graphical Representation:

This involves using visual elements like charts, plots, and graphs to make data patterns, trends, and distributions easier to understand at a glance.


Histograms

A histogram is a graph that shows the frequency of numerical data using rectangles. The height of a rectangle (the vertical axis) represents the distribution frequency of a variable (the amount, or how often that variable appears).


Python implementation of histograms.

import matplotlib.pyplot as plt

# Sample data
data = [10, 15, 10, 20, 25, 30, 10, 20, 25, 30, 35, 40, 25, 30, 20]

# Create histogram
plt.hist(data, bins=6, color='skyblue', edgecolor='black')

# Add titles and labels
plt.title("Histogram of Sample Data")
plt.xlabel("Value Range")
plt.ylabel("Frequency")

# Show plot
plt.show()

Output:

Histogram in python - Colabcodes

Bar Charts

A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a column chart.


Python implementation of Bar Charts.

import matplotlib.pyplot as plt

# Sample categorical data
categories = ['Python', 'Java', 'C++', 'JavaScript', 'Go']
values = [60, 45, 30, 50, 20]

# Create bar chart
plt.bar(categories, values, color='mediumseagreen', edgecolor='black')

# Add labels and title
plt.title("Programming Language Popularity")
plt.xlabel("Languages")
plt.ylabel("Number of Users (in millions)")

# Display the chart
plt.show()

Output:

Bar Chart in python - Colabcodes

Line Charts

These are a fundamental chart type generally used to show change in values across time.

Python implementation of Bar Charts.

import matplotlib.pyplot as plt

# Sample time series data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
sales = [250, 300, 280, 350, 400, 420]

plt.plot(months, sales, marker='o', color='teal', linestyle='-')
plt.title("Monthly Sales Trend")
plt.xlabel("Month")
plt.ylabel("Sales ($)")
plt.grid(True)
plt.tight_layout()
plt.show()

Output:

Line Chart in python - Colabcodes

Pie Charts

A pie chart is a type of graph representing data in a circular form, with each slice of the circle representing a fraction or proportionate part of the whole. All slices of the pie add up to make the whole equaling 100 percent and 360 degrees.


Python implementation of pie chart

labels = ['Python', 'JavaScript', 'Java', 'C++', 'Go']
sizes = [30, 25, 20, 15, 10]
colors = ['gold', 'lightcoral', 'lightskyblue', 'lightgreen', 'violet']

plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=140)
plt.axis('equal')  # Keeps the pie chart circular
plt.title("Programming Language Market Share")
plt.tight_layout()
plt.show()

Output:

Pie Chart in python - colabcodes

Scatter Plots

A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables.


Python implementation of scatter plot

import numpy as np

x = np.random.randint(10, 50, 30)
y = x + np.random.randint(-10, 10, 30)

plt.scatter(x, y, color='purple')
plt.title("Scatter Plot: X vs Y")
plt.xlabel("X-axis values")
plt.ylabel("Y-axis values")
plt.grid(True)
plt.tight_layout()
plt.show()

Output:

scatter plot in python - colabcodes

Bubble Charts

A bubble chart is primarily used to depict and show relationships between numeric variables. They are a great tool to establish the relationship between variables and examine relationships between key business indicators, such as cost, value and risk.


Python implementation of Bubble chart

x = np.random.rand(20) * 100
y = np.random.rand(20) * 100
sizes = np.random.rand(20) * 1000  # Bubble size

plt.scatter(x, y, s=sizes, alpha=0.5, color='tomato')
plt.title("Bubble Chart: Cost vs Value with Risk Bubble Size")
plt.xlabel("Cost")
plt.ylabel("Value")
plt.grid(True)
plt.tight_layout()
plt.show()

Output:

Bubble Chart in python - Colabcodes

Box Plots

A boxplot is a standardized way of displaying the dataset based on the five-number summary: the minimum, the maximum, the sample median, and the first and third quartiles. Minimum (Q0 or 0th percentile): the lowest data point in the data set excluding any outliers.


Python implementation of box plot

import seaborn as sns

# Sample numerical data
data = [55, 60, 62, 70, 75, 78, 79, 80, 82, 85, 90, 95, 100, 110, 115, 120]

sns.boxplot(data=data, color='skyblue')
plt.title("Box Plot of Values")
plt.xlabel("Distribution")
plt.tight_layout()
plt.show()

Output:

box plot in python - Colabcodes

Whisker Plots

A Box and Whisker Plot (or Box Plot) is a convenient way of visually displaying groups of numerical data through their quartiles.


The "whisker" in a Box-and-Whisker Plot (commonly just called a Box Plot) refers to the lines extending from the box — they represent the variability outside the upper and lower quartiles and help visualize spread, skewness, and outliersin a dataset.


Clustering and Segmentation Techniques:

These are unsupervised learning methods used to group similar data points based on patterns or shared characteristics without predefined labels.


K-Means Clustering

The method is a local search that iteratively attempts to relocate a sample into a different clusters long as this process improves the objective function. When implemented in python the plot for 4 cluster looks like shown in the figure below:

k-means clustering for data analytics in python

Hierarchical Clustering

Hierarchical clustering is a popular method for grouping objects. It creates groups so that objects within a group are similar to each other and different from objects in other groups. Clusters are visually represented in a hierarchical tree called a dendrogram.

Hierarchical Clustering - python

Other Related Techniques used in Descriptive Analytics are Listed Below


Correlation Matrix: A correlation matrix is a statistical technique used to evaluate the relationship between two variables in a data set. The matrix is a table in which every cell contains a correlation coefficient, where 1 is considered a strong relationship between variables, 0 a neutral relationship and -1 a not strong relationship.


Frequency Tables: A frequency table shows the distribution of observations based on the options in a variable. Frequency tables are helpful to understand which options occur more or less often in the dataset.


Percentiles & Quartiles: Percentiles are a type of quantiles, obtained adopting a subdivision into 100 groups. The 25th percentile is also known as the first quartile (Q1), the 50th percentile as the median or second quartile (Q2), and the 75th percentile as the third quartile (Q3).


Cross-Tabulations: a cross-tabulation is a two- (or more) dimensional table that records the number (frequency) of respondents that have the specific characteristics described in the cells of the table. Cross-tabulation tables provide a wealth of information about the relationship between the variables.


Pivot Tables: A Pivot Table is an interactive way to quickly summarize large amounts of data.


Word Frequency Analysis: Word frequency analysis is the most basic and common method of qualitative text data analysis. It involves counting up mentions of a particular word or phrase as a means of understanding the dominant topics in a particular data set.


EDA: Exploratory Data Analysis (EDA) is an analysis approach that identifies general patterns in the data. These patterns include outliers and features of the data that might be unexpected. EDA is an important first step in any data analysis.


Conclusion

Descriptive analytics lays the foundation for deeper data understanding by transforming raw numbers into meaningful insights. Through techniques like measures of central tendency, dispersion, and visualizations such as bar charts, line graphs, and box plots, you gain clarity about patterns within your data. Combined with clustering and segmentation, these methods enable better categorization and interpretation. With Python as your tool, you're now equipped to explore, summarize, and communicate data effectively—setting the stage for more advanced analytics and decision-making.

These techniques as discussed above in descriptive analytics enable data analysts and decision-makers to explore, summarise, and interpret historical data effectively, facilitating insights and informed decision-making. Each technique serves a unique purpose in understanding and describing various aspects of the data.


Get in touch for customized mentorship, research and freelance solutions tailored to your needs.

bottom of page