Caroline W. Mugambi

Data Science/ Analytics Portfolio

Welcome to my Portfolio

As a dedicated and passionate data professional, I have gathered a diverse range of projects that exemplify my expertise in data analysis and data science. With a strong foundation in statistics, programming, and data manipulation, I am committed to translating raw data into actionable insights that drive informed decision-making.

About me

I am a Data Analyst on a relentless quest to transform raw information into valuable insights that assist organizations to uncover trends and patterns, make predictions and guide decision making. I am proficient in Microsoft Excel, Tableau, Structure Query Language (SQL), Power BI, QGIS, Python and R languages. I have experience in Data Collection, Data Cleaning, Exploratory Data Analysis, Machine Learning using algorithms, building dashboards, reports and other visualizations to provide insights for making smart and sustainable decisions. With a background in data science and analysis, I thrive on the challenge of extracting meaning from complex datasets and solving real-world problems.

My Projects

Credit Card Fraud Detection


As electronic commerce gains rapid growth and significant impact across the world, the Credit Card has become a DeFacto standard for payment of goods and services. Unfortunately, this has led to rapid growth in credit card fraud making it a big problem for consumers, financial institutions and law enforcement agencies. Using machine learning algorithms, below is a project I worked on predicting Credit Card fraud.

Accuracy Scores:
Random Forest – 99.81%
Decision Tree – 98.27%
Support Vector Machine – 89.2%

Data from www.kaggle.com

Churn Analytics

Customer Churn Analysis Using Python
Using 3 machine learning algorithms; Decision Tree, Random Forest and XGBoost, below is a project I worked on predicting customer churn. The models performed as follows:
Xgboost – 93.5%
Random Forest – 82.1%
Decision Tree – 78%

The models attributed importance to the following features in determining whether the customers would stop/continue to do business with them; No of products held by the customer, Is customer an active member, Age of customer, Balance in the customers’ account and age.
Below is a copy of the jupyter notebook detailing the analysis and model fitting and accuracy scores.

Data obtained from www.kaggle.com.

Disbursement of Funds to Counties in Kenya

Since inception of devolution in 2013, Kenya has made great strides in achieving equitable distribution of funds among the counties. Below is a visual I worked on using QGIS to show county disbursements from 2013 to 2020.

Data obtained from the National Treasury and Planning government website.

Electronics Sales Analysis

PowerBi is indeed a powerful tool for creating visuals from data that can be used by organizations to gain valuable insights and make smart, sustainable choices and decisions. What’s more? It is an open source, thus easily accessible and convenient to use.

Below is a dashboard I created using PowerBi, to analyze performance in a company that sells electronics. Some of the insights gained include;

1. In the period under review, the company had a total sales figure of $8.34b and a net profit of $4.79b after costs that totaled $3.55b. The company may need to look into ways of reducing that cost that is eating up to for 42% of their income.

2. There is a drop in sales from $3.1b in the first year under review to $2.5b in the third year. Sales seem to be at a high in the 4th and 2nd quarter of each year. The 1st and 3rd quarters of each year recorded lower sales.

3. Stores/Physical shops accounted for a higher number of sales as compared to other channels, that is online sales, resells and catalog sales.

4. The most popular products were Computers, followed by Cameras, and then Television sets. Cell phones and music gadgets recorded lower sales in the period under review.

5. Geographically, North America had the most sales, followed by Asia and the Europe. USA, China, France, United Kingdom and Canada were the top 5 countries with highest sales respectively.

Logistics Analysis

I really enjoyed creating below simple dashboard using Power BI, for a logistics company in Africa for the period between 2021 and 2022. The company contracted the services of 7 couriers and the data had information on the collection date, contract terms (in days) and delivery date, among other variables.

I used this to calculate the expected delivery date and then compared it with the actual delivery dates to determine if the goods were delivered on time, late or not delivered at all (for the rows that had a null delivery date). I then created below simple visuals to show each courier’s performance in terms of total number of packages delivered, status of delivery, total sales/revenue realized. I also created a slicer to enable the stakeholders view this performance per year/month.

Get in Touch

Nairobi, Kenya
+254723854249
mugamc009@gmail.com

Linkedin – Wanjiku Mugambi | LinkedIn