Data Science Projects - Data Analysis

Project: Waze User Churn Analysis

This project explores user behavior and churn patterns for the Waze navigation app. The goal was to identify factors associated with users leaving the platform, helping inform retention strategies. Using a real dataset of 14,999 users, I conducted an exploratory data analysis (EDA) to uncover insights about engagement, device usage, and driving patterns.

Key Steps

Data Cleaning & Missing Values: The dataset had 700 missing labels. Analysis confirmed no other significant missing values, and patterns were explored to understand potential biases.
Descriptive Analysis: I summarized usage metrics, including sessions, drives, total kilometers driven, and duration per drive. Comparisons between retained and churned users highlighted key differences.
Device Analysis: Users were split between Android and iPhone devices. Interestingly, device type did not significantly influence churn rates.
Behavioral Metrics: Median kilometers per drive and per driving day were calculated, revealing that retained users drove slightly more per session and per day than churned users. Similarly, drive frequency per driving day provided insights into user engagement.
Visualization & Summary: The results were presented through clear tables and visualizations to illustrate trends, supporting actionable insights for product decisions.

Key Insights

Retained users exhibited higher engagement, slightly more kilometers per drive, and a higher median number of drives per day.
Churned and retained users were evenly split across device types, suggesting that Android vs. iPhone did not drive user retention differences.
Missing labels appeared randomly across devices, with no strong pattern to suggest a data bias.
The analysis highlighted the “super-drivers” segment, prompting further investigation into their unique needs.

Outcome & Next Steps

This analysis provides a foundation for developing predictive models to identify at-risk users. Next steps include deeper feature engineering, additional data collection on high-frequency users, and implementing machine learning models to forecast churn probabilities.

Tools Used: Python (pandas, NumPy), Jupyter Notebook, HTML export for reporting.

Kaggle Link

GitHub Link

Unicorn Companies Analysis

Project 2: Unicorn Companies Analysis

In this project, I explored a dataset of unicorn companies to analyze patterns in industry growth, time to achieve unicorn status, and valuation trends. The goal was to uncover insights into how different industries foster high-growth startups and to identify companies that achieved exceptional milestones.

Using Python and Pandas, I first sampled 50 unicorn companies from the dataset to create a manageable subset for visualization and analysis. I converted key columns, such as “Date Joined” and “Valuation,” into numerical formats to enable accurate calculations. This allowed me to calculate the time each company took to reach unicorn status by subtracting the year founded from the year joined. I also converted valuation strings like “$1.2B” or “$800M” into numeric values to compare maximum valuations across industries.

Data visualization played a crucial role in this analysis. I used matplotlib.pyplot to generate bar charts showing both the longest time to unicorn status per industry and the maximum company valuations by industry. This approach helped highlight patterns, such as industries where startups tend to scale faster or achieve higher valuations. Grouping data by industry provided a clear overview while preserving individual company insights for deeper analysis.

The project demonstrates skills in data cleaning, transformation, exploratory data analysis, and visualization, as well as the ability to communicate findings clearly through visual storytelling. It is particularly relevant for stakeholders interested in startup trends, venture capital, and growth strategy.

The full code, dataset, and visualizations for this project are available on my GitHub and Kaggle repositories for reference and replication:

GitHub: [GitHub link]
Kaggle: [Kaggle link]
HTML: [Click Here]

Unicorn Companies Download

This analysis highlights my ability to turn raw data into actionable insights, making it a strong example of my data analytics and visualization capabilities for portfolio presentation.

See more analysis on Unicorn project

NOAA Lightning Data Analysis

NOAA lightning with date change Download

NOAA EDA structuring Download

This project explores NOAA lightning strike data using Python to perform exploratory data analysis (EDA) and time-based visualization. The primary objective is to examine temporal patterns in lightning activity and demonstrate practical data preprocessing techniques used in real-world analytical workflows.

A key component of the analysis involves converting the dataset’s date column into a proper datetime format using pandas. This transformation enables accurate time-series plotting, filtering, and aggregation. Once converted, the data can be grouped by day, month, quarter, or year to uncover trends and seasonal patterns in lightning occurrences.

The project walks through the full analytical pipeline: loading and inspecting the dataset, cleaning and preparing variables, transforming date fields, aggregating strike counts, and generating visualizations to support interpretation. By restructuring the time variable into analyzable components, the analysis highlights how datetime conversion is essential for meaningful temporal insights.

Using pandas for data manipulation and matplotlib for visualization, the project demonstrates how structured aggregation and plotting can reveal fluctuations in lightning frequency across months and years. The results help identify recurring seasonal peaks and variations in activity over time.

GitHub: [GitHub link]
Kaggle: [Kaggle link]
HTML:[Click Here] / [Click Here]