Absenteeism in The Workplace: Insights from a Data-Analysis Project

Seun Adegbola
5 min readFeb 18, 2024

--

Image source

Absenteeism in the workplace is a critical issue impacting productivity, efficiency, and overall functioning of an organization. It disrupts workflow, decreases productivity, and strains team dynamics. Moreover, it poses significant financial challenges for organizations. According to the Center for Disease Control and Prevention Foundation, absenteeism costs employers $225.8 billion annually in terms of lost productivity.

Understanding the factors contributing to absenteeism is essential for implementing effective strategies to to address and reduce its occurrence.

In this data analysis project focused on HR absenteeism, I explored the dataset comprising three tables: Absenteeism_at_work, Reasons, and Compensation using Microsoft SQL Server for data analysis and PowerBI for visualization

The Steps for the project has been divided into :

  • Project objective
  • Build a database
  • Analysis
  • Connect Database to Powerbi
  • Visualization

Project objective

The project objectives are multifaceted, aiming to provide insights into employee health, compensation, reasons for absenteeism, and seasonal variations.

The project aims to achieve the following goals:

  1. Provide a list of individuals maintaining good health and exhibiting low absenteeism for eligibility in a healthy bonus program, with a total budget of $1000
  2. Calculate the wage increase or annual compensation for non-smokers within a budget of $931,221 allocated for all non-smokers to cover insurance costs.
  3. Identify and analyze the most common reasons contributing to employees’ absenteeism.
  4. Identify the age groups or specific categories of individuals most significantly affected by absenteeism in the workplace.

Building the database

To get started a created a database ‘work’ where the Absenteeism_at_work , Reasons and compensation were stored

CREATE DATABASE work;
‘work’ database and the tables

Analysis

To answer the first question — Provide a list of healthy individuals and low absenteeism for our healthy bonus program .

From the dataset , a peak at the following columns : Social_smoker, Social_drinker, Body_mass_index shows that there may be individuals who are smokers, drinkers and above the recommended body mass index. So we define a healthy individual as someone :

Who is not a social smoker or social drinker, has a body mass index less than 25 and absenteeism hours which is lower than the average absenteeism time .

WITH healthy_workers AS(
SELECT * FROM absenteeism
WHERE Social_drinker = 0 AND Social_smoker = 0
AND Body_mass_index < 25 AND Absenteeism_time_in_hours <
(SELECT AVG(Absenteeism_time_in_hours) FROM absenteeism)
)
SELECT DISTINCT COUNT(ID)
FROM healthy_workers;

There are a total of 111 healthy workers

Next — Calculate A wage increase or annual compensation for non-smokers for insurance budget $931, 221 for all non-smokers

SELECT COUNT(*) as nonsmokers FROM absenteeism
WHERE Social_smoker = 0

There were a total of 686 non-smokers

To calculate the wage increase

Number of hours worked

(5 days a week x 8 hours a day ) x 52 weeks in a year =2080 hours

Total number of hours for 686 employees

2080 x 686 = 1,426,880 hours

Increase in worker compensation

983,221 /1,426,880 = 0.68 increase per hour

For the remaining part of the questions, I utilized PowerBI to visualize the answers.

Optimizing the query

To create a final query to connect Microsoft SQL server to Powerbi for visualization

I decided to add the following columns:

season_names— which is defined as

  • 12, 1, 2 months — winter
  • 3, 4, 5 months — Spring
  • 6, 7, 8 months — Summer
  • 9, 10, 11 months — Autumn

BMI_ClassCenter for disease control and prevention defines the adult bmi status as

  • Below 18.5 — underweight
  • 18.5–24.9 — Healthy weight
  • 25.0–29.9 overweight
  • 30.0 and above obesity

I also added age_class: 20 -29, 30–39, 40–49, 50–59 , 60s

Putting it all together

SELECT 
a.ID,
a.Reason_for_absence,
a.Day_of_the_week,
Month_of_absence,
CASE WHEN Month_of_absence IN (12,1,2) Then 'Winter'
WHEN Month_of_absence IN (3,4,5) Then 'Spring'
WHEN Month_of_absence IN (6,7,8) Then 'Summer'
WHEN Month_of_absence IN (9,10,11) Then 'Autumn'
Else 'unknown' END as Season_names,
Body_mass_index,
CASE WHEN Body_mass_index < 18.5 Then 'Underweight'
WHEN Body_mass_index BETWEEN 18.5 and 24.9 Then 'Health'
WHEN Body_mass_index BETWEEN 25 and 30 Then 'Overweight'
WHEN Body_mass_index > 30 Then 'Obese'
ELSE 'Unknown' end as BMI_Class,
Age,
CASE WHEN Age BETWEEN 20 and 29 Then '20-29'
WHEN Age BETWEEN 30 and 39 Then '30-39'
WHEN Age BETWEEN 40 and 49 Then '40-49'
WHEN Age BETWEEN 50 and 59 Then '50-59'
WHEN Age >= 60 Then '60s'
ELSE 'Unknown' end as Age_Class,
a.Seasons,
a.Transportation_expense,
a.Distance_from_Residence_to_Work,
a.Son,
a.Education,
a.Social_drinker,
a.Social_smoker,
a.Absenteeism_time_in_hours,
a.Service_time,
a.Work_load_Average_day,
a.Hit_target,
a.Disciplinary_failure,
a.Weight,
a.Height,
a.Pet,
c.comp_hr,
r.Reason
FROM absenteeism a
LEFT JOIN compensation c
ON a.ID = c.ID
LEFT JOIN Reasons r
on a.Reason_for_absence = r.Number;

Connect Database to Powerbi

To complete the final phase of this analysis, I connected the ‘work’ database to Powerbi and included the final SQL query.

connecting the database

Then I loaded the table

Dashboard

Insights

  • There are a total of 740 employees, with a total absenteeism of 5,124 hours and an average of 6.92 hours.
  • Non-smokers constitute 92.7% of the workforce, while smokers make up 7.3%.
  • Social drinkers account for 56.76% of employees, with non-drinkers comprising 43.24%.
  • The 30–39 age group contributes to 55.7% of total absenteeism hours 2855 hours
  • Overweight individuals (BMI class) make up 46.47% of absenteeism, totaling 2,381 hours.
  • Medical consultation, dental consultation, and physiotherapy are the top three reasons for absenteeism.
  • The 7th Month ( July) had the highest average absenteeism time in hours at 10.96 hours followed by April with 9.09 hours
  • Monday had the highest average absenteeism by day of the week at 9.25 hours.

Recommendations

Employees should be encouraged to undergo regular health check-ups and incentives can also be offered for preventive measures like vaccinations. This approach seeks to address health concerns early, minimizing the impact of common reasons for absenteeism, particularly related to medical and dental consultations.

Work plans and important events should be adjusted to keep in mind the times of the year when fewer people tend to be absent.This helps manage the workforce better, especially during months with historically lower absenteeism.

Stronger efforts to help employees quit smoking should be encouraged. This can include information sessions, support for quitting, and rewards for those who successfully quit. With this, the workplace would be healthier by reducing smoking-related absenteeism.

Health program with activities and workshops specifically for employees aged 30–39 and those in the overweight BMI class should be encouraged.

Thank you for reading! feel free to connect with me on LinkedIn and Twitter. You can also check out my GitHub here.

#Powerbi #datafam #dataanalyst #SQL

--

--

Seun Adegbola
Seun Adegbola

Written by Seun Adegbola

Transforming raw data into meaningful insights. Telling stories hidden in data

No responses yet