Ram Mopati Portfolio

Work Experience

Infosys Ltd (Company Role: Software Quality Engineer), Banglore, India.

Client: UK Insurance Project | Data Analyst

3+ years of experience | Feb 2022 - May 2025

As a Data Analyst, I played a key role in transforming raw insurance and policy data into actionable business insights by collaborating with Business Analysts, Scrum Leads, and product managers to meet evolving business requirements. I conducted in-depth exploratory data analysis (EDA), built SQL-based reports, and designed interactive dashboards using Power BI that empowered business teams to make informed, data-driven decisions. My work involved automating reporting pipelines using Azure Databricks and PySpark to enhance the efficiency of data processing tasks, especially for large-scale claim and policy datasets (10M+ rows). I also developed Python scripts for data transformation, anomaly detection, and analytical modeling to support strategic decisions. Additionally, I was responsible for creating and maintaining clear documentation of data processes, validation rules, and business logic to ensure transparency, reusability, and audit-readiness. I actively participated in Agile/Scrum ceremonies—including sprint planning, reviews, and retrospectives—which enabled collaborative progress tracking and iterative delivery of analytics solutions. These contributions collectively led to improved data visibility, a 50% reduction in manual reporting efforts, faster reporting cycles, and more consistent stakeholder engagement through high-quality, insight-driven deliverables.

Key Responsibilities:

Designed and deployed interactive Power BI dashboards for 5 business units, enabling faster and more informed decision-making.
Performed exploratory data analysis (EDA) on policy and claim datasets to uncover trends, anomalies, and patterns driving business insights.
Built complex SQL-based reports using Oracle SQL Developer to extract, transform, and analyze large-scale datasets (10M+ rows).
Collaborated with Business Analysts and Scrum Leads of England, Netherlands, Newzealand, Japan for taking handover of business requirements and succesfull deliver of analytics solutions aligned with project KPIs.
Contributed to Agile/Scrum ceremonies, including sprint planning and reviews, improving team alignment and delivery cycles.
Used Azure Databricks with PySpark to automate data processing pipelines, optimizing performance for high-volume claim data.
Developed Python scripts for data cleaning, transformation, and anomaly detection to support business reporting workflows.
Documented data logic, validation rules, and transformation flows to ensure transparency, reproducibility, and smooth handovers.
Presented analytical insights and dashboards to business stakeholders, translating data into strategic decisions in weekly reviews.
Implemented data quality checks to maintain accuracy and reliability across reporting deliverables and dashboards.
Automated 4+ reporting pipelines using Databricks and SQL, reducing manual workload by 50% and improving efficiency.
Proactively identified process gaps in data workflows and implemented solutions that improved reporting accuracy and speed.
Handled multiple concurrent analytics tasks within Agile sprints, delivering high-quality solutions under tight deadlines.
Engaged in continuous learning through self-study, projects, and certifications in data science, cloud, and machine learning technologies.

Tools:Microsoft Power BI, Oracle SQL Developer, Pyspark, Python, Azure Databricks, Azure Devops, SQL, Excel, Agile.

Projects

Enterprise Customer Churn or Retention Analytics Platform

Business Impact Summary:

Developed a production-ready machine learning web application that predicts customer churn for telecom companies with 99%+ accuracy. Built an end-to-end MLOps platform featuring real-time predictions, automated deployment, and comprehensive data analysis to help businesses proactively identify at-risk customers and implement retention strategies. The application combines advanced machine learning algorithms with modern web technologies and cloud infrastructure for scalable, reliable performance.

Technical Features:

Data Processing Pipeline: Implements comprehensive data preprocessing with custom transformers for categorical encoding, numerical scaling, and feature engineering on 440K+ customer records.
Machine Learning Models: Trained and compared multiple algorithms including Random Forest, XGBoost, LightGBM, and SVM, achieving consistent 99%+ ROC-AUC scores.
Exploratory Data Analysis (EDA): Conducted thorough data analysis using Sweetviz and ydata-profiling to understand customer behavior patterns and feature relationships.
Model Optimization: Applied hyperparameter tuning with GridSearchCV and cross-validation techniques to maximize model performance and reliability.
REST API Development: Built FastAPI backend with Pydantic models for data validation, supporting both single and batch predictions with comprehensive error handling.
API Testing & Validation: Performed comprehensive endpoint testing using Postman to validate API functionality, request/response schemas, and error handling scenarios.
Web Interface: Created responsive HTML frontend with Jinja2 templating for real-time customer churn predictions through user-friendly forms.
Docker Containerization: Containerized application with optimized Dockerfile and automated Docker image builds pushed to Docker Hub registry for version control and distribution.
Azure Cloud Deployment: Deployed containerized application on Azure Container Apps pulling from Docker Hub with auto-scaling and managed infrastructure.
CI/CD Pipeline: Implemented GitHub Actions workflow for automated testing, Docker image building, registry publishing, and seamless cloud deployment.
Testing Framework: Comprehensive test suite using pytest covering API endpoints and data preprocessing functionality.
Interactive Development: Jupyter Notebooks provide detailed model training, evaluation, and comparison with visualizations for transparency.
Production Monitoring: Health check endpoints and model reload functionality for production maintenance and updates.
Performance: Achieved 99.99% ROC-AUC score with Random Forest model on large-scale telecom dataset.

Tools: Python, FastAPI, Scikit-learn, XGBoost, LightGBM, Pandas, NumPy, Docker, Docker Hub, Azure Container App, CI/CD Pipelines, GitHub Actions, Postman, Jupyter Notebook, Matplotlib, Seaborn, Pytest, Jinja2, Pydantic, Git, HTML.

AZURE APP Source Code

Agricultural Disease Detection System

Business Impact Summary:

Developed a robust deep learning application to classify potato leaf diseases from image data, empowering farmers and agricultural professionals with early detection and effective disease management. This project leverages advanced computer vision and neural network techniques to streamline the identification of common potato diseases (such as early blight and late blight) from uploaded images, providing fast and accurate results for practical field use.

Technical Features:

Comprehensive Image Data Pipeline: Loaded, preprocessed, and augmented a large, real-world dataset of potato leaf images (healthy and diseased) to ensure robust model training and testing.
Exploratory Data Analysis (EDA): Performed in-depth EDA to understand class distributions, visualize disease patterns, and guide model development.
Image Preprocessing: Applied transformations such as resizing, normalization, and augmentation (rotation, flipping, etc.) to enhance model generalization and performance.
Deep Learning Model Training: Built and trained state-of-the-art convolutional neural networks (CNNs) using TensorFlow and Keras, achieving high classification accuracy (94%) on test data.
Performance Visualization: Visualized training/validation accuracy and loss curves, as well as confusion matrices, to evaluate and interpret model effectiveness.
Interactive Notebooks: Documented the entire workflow in Jupyter Notebooks, including code, visualizations, and step-by-step explanations for transparency and reproducibility.
Result Interpretation: Provided clear output of predicted disease class along with confidence scores for each prediction, aiding user decision-making.
Web Application Deployment: Deployed the trained model as an interactive web application using Streamlit, allowing users to upload images and receive instant predictions.
API Integration: Developed a FastAPI backend to serve model predictions via RESTful endpoints, supporting scalable and cloud-ready deployment.
Export and Real-World Deployment: Supported exporting trained models for integration into real-world or mobile applications, enabling practical field use.
Cloud & Containerization Ready: Project is structured for easy Dockerization and deployment to cloud platforms such as Azure, AWS, or GCP.
Code Quality & Best Practices: Utilized modular code structure, requirements files, and version control for maintainability and collaboration.
Reproducibility & Documentation: Provided detailed documentation and Jupyter Notebooks for transparency, reproducibility, and ease of understanding.

Tools:Python, TensorFlow, Keras, OpenCV, Numpy, Pandas, Scipy, Matplotlib, Seaborn, Scikit-learn, Streamlit, FastAPI, Uvicorn, Jupyter Notebook, Request, Threading, Visualization, Deep Learning, CNN, Streamlit Community Cloud.

STREAMLIT APP Source Code

Financial Risk Assessment & Credit Scoring Platform

Business Impact Summary:

Developed a machine learning application that predicts loan approval outcomes based on applicant data, assisting financial institutions with risk assessment and streamlined decision-making. Leveraging advanced data processing techniques and predictive modeling, this project automates the evaluation of loan applications using key financial and demographic features, providing rapid and reliable approval predictions for practical business use.

Technical Feature:

Data Ingestion: Loads and preprocesses real-world loan application datasets for robust model training and testing.
Dataset Preparation: Utilizes comprehensive datasets with features relevant to loan approval, such as applicant income, credit history, employment status, and property details.
Exploratory Data Analysis (EDA): Performs EDA to understand feature distributions, correlations, and visualizes important data patterns.
Data Preprocessing: Applies normalization, encoding, and feature engineering to enhance model performance and handle missing values.
Model Training: Implements state-of-the-art machine learning algorithms (e.g., logistic regression, decision trees, random forest, gradient boosting) to classify applications as approved or denied.
Performance Visualization: Visualizes training and validation metrics, ROC curves, and confusion matrices to assess model effectiveness.
Interactive Notebooks: Jupyter Notebooks provide step-by-step documentation, code, and visualizations for transparency and reproducibility.
Result Interpretation: Offers clear output of predicted loan status along with probability/confidence scores for each application.
Export and Deployment: Supports exporting trained models for integration with web or mobile applications, including demonstration via Streamlit Community Cloud.
Performance: Achieved accuracy of 92%.

Tools:Jupyter Notebook, Python, Scikit-Learn, NumPy, Pandas, Matplotlib/Seaborn, Data Visualization, Machine Learning, Streamlit Community Cloud.

STREAMLIT APP Source Code

Luxury Stones Market Valuation Engine

Business Impact Summary:

Developed a machine learning project designed to estimate gemstone prices based on their characteristics such as carat, depth, table, dimensions, cut, color, and clarity.The project features a complete data pipeline, from data ingestion and preprocessing to advanced model training and deployment.

Technical Features:

Data Ingestion: Utilizes Kaggle's Gemstone dataset, which includes various gemstone attributes and their corresponding prices.
Exploratory Data Analysis (EDA): Conducts thorough EDA to understand data distributions, relationships, and potential outliers, using visualizations like scatter plots, histograms, and box plots.
Feature Engineering: Implements feature engineering techniques to create new features, such as calculating volume from dimensions, and applies transformations to improve model performance.
Data Preprocessing: Handles missing values, scales numerical features, and encodes categorical variables to prepare the dataset for modeling.
Model Training: Employs a variety of regression algorithms, including CatBoost, XGBoost, and KNN, to predict gemstone prices. These models are combined using a Voting Regressor to enhance accuracy.
Hyperparameter Tuning: Utilizes GridSearchCV to optimize model parameters, ensuring the best performance for the ensemble model.
Model Evaluation: Evaluates model performance using metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared, ensuring robust predictions.
Data Pipeline: Cleans and transforms raw gemstone data, applying techniques like imputation, scaling, and categorical encoding.
Modeling: Utilizes advanced regression models, including CatBoost, XGBoost, and KNN, combined through a Voting Regressor to improve predictive accuracy. Hyperparameter tuning and model evaluation are performed to select the best ensemble.
Deployment: The application is deployed on AWS Elastic Beanstalk with a REST API for integration and testing.
Web Application: Provides an interactive Flask-based web interface where users can input gemstone features and instantly receive a price prediction.
Explainability: Integrates LIME for model interpretation to help users understand the factors influencing predictions.
Performance: Achieved accuracy of 98%.

Tools:Data Ingestion, EDA, Visualization, Probability&Statistics, Feature Engineering, Feature Scaling, Feature Selection, Data Preprocessing, Python Libraries, Machine Learning, AWS Cloud Deployement.

Source Code

Real Estate Investment Analytics System

Business Impact Summary:

Developed a machine learning web application designed to estimate house prices in the Boston area based on multiple property features. Leveraging regression techniques and an ensemble of advanced models (including CatBoost, XGBoost, and KNN), this project guides users through predicting home values by inputting relevant attributes such as crime rate, number of rooms, tax rate, and more.

Technical Features:

Data Ingestion: Loads and splits real-world housing data for training and testing.
Exploratory Data Analysis (EDA): Conducts thorough EDA to understand data distributions, relationships, and potential outliers, using visualizations like scatter plots, histograms, and box plots.
Feature Engineering: Implements feature engineering techniques to create new features and applies transformations to improve model performance.
Data Preprocessing: Handles missing values, scales numerical features, and encodes categorical variables to prepare the dataset for modeling.
Model Training: Employs a variety of regression algorithms, including CatBoost, XGBoost, and KNN, to predict house prices. These models are combined using a Voting Regressor to enhance accuracy.
Hyperparameter Tuning: Utilizes GridSearchCV to optimize model parameters, ensuring the best performance for the ensemble model.
Model Evaluation: Evaluates model performance using metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared, ensuring robust predictions.
Web Application: Features both Flask and Streamlit interfaces, allowing users to input data via a user-friendly form and instantly receive price predictions
Model Serialization: The trained models and scalers are saved as pickle files for deployment.
API Integration: The Flask app exposes a /predict_api endpoint for programmatic prediction.
Comprehensive User Input: Supports detailed input for all relevant property features, ensuring accurate and personalized predictions.
Performance: Achieved accuracy of 99%.

Tools:Data Ingestion, EDA, Data Preprocessing, Visualization, Python Libraries, Machine Learning, Streamlit Community Cloud.

STREAMLIT APP Source Code

Clinical Decision Support System for Cancer Screening

Business Impact Summary:

Developed a deep learning application that predicts breast cancer from medical data, assisting healthcare professionals with early diagnosis and effective patient management. Leveraging advanced machine learning techniques and neural networks, this project streamlines the process of identifying the likelihood of breast cancer based on clinical features, providing rapid and accurate predictions for practical healthcare use.

Technical Features:

Data Ingestion: Loads and preprocesses real-world breast cancer datasets for robust model training and testing.
Dataset Preparation: Utilizes comprehensive datasets with features relevant to breast cancer diagnosis, such as cell characteristics and biopsy data.
Exploratory Data Analysis (EDA): Performs EDA to understand feature distributions, correlations, and visualize data patterns.
Data Preprocessing: Applies normalization, scaling, and feature engineering to improve model performance.
Model Training: Implements state-of-the-art neural networks and machine learning algorithms (e.g., logistic regression, random forest, CNNs) to classify cases as malignant or benign.
Performance Visualization: Visualizes training and validation accuracy/loss curves, ROC curves, and confusion matrices to evaluate model effectiveness.
Interactive Notebooks: Jupyter Notebooks provide step-by-step documentation, code, and visualizations for transparency and reproducibility.
Result Interpretation: Offers clear output of predicted cancer status along with confidence scores for each prediction.
Export and Deployment: Supports exporting trained models for deployment in real-world or mobile applications.
Performance: Achieved accuracy of 97%.

Tools:Jupyter Notebook, Python, TensorFlow/Keras (or PyTorch), Scikit-Learn, Data Visualization, NumPy, Matplotlib/Seaborn, Machine Learning, Deep Learning, Streamlit Community Cloud.

STREAMLIT APP Source Code

Educational Analytics & Performance Optimization System

Business Impact Summary:

Developed a machine learning-based application to predict student academic performance, enabling educators and institutions to identify at-risk students early and implement targeted interventions. By leveraging advanced data analytics and predictive modeling, this project streamlines the evaluation of student outcomes based on a variety of academic and socio-demographic factors, facilitating data-driven decisions for improved educational management and student success.

Technical Features:

Data Ingestion: Efficiently loads and preprocesses real-world student datasets from diverse sources for robust model training and evaluation.
Dataset Preparation: Utilizes comprehensive educational datasets with features such as grades, attendance, parental background, and study habits relevant to academic performance prediction.
Exploratory Data Analysis (EDA): Performs in-depth EDA to understand feature significance, uncover data trends, and visualize relationships affecting student outcomes.
Data Preprocessing: Implements normalization, encoding, and feature selection/engineering to enhance model accuracy and reliability.
Model Training: Applies a range of machine learning algorithms (e.g., logistic regression, decision trees, random forest, neural networks) to classify student performance levels or predict grades.
Performance Visualization: Visualizes model training/validation metrics, confusion matrices, and feature importances for comprehensive performance evaluation.
Interactive Notebooks: Jupyter Notebooks provide transparent, step-by-step analysis with code, commentary, and visualizations to enhance reproducibility and ease of understanding.
Result Interpretation: Delivers interpretable prediction outputs, including probability/confidence scores, to aid actionable insights for educators.
Export and Deployment: Supports exporting trained models for integration into educational platforms or deployment via web applications.
Performance: Achieved accuracy of 97%.

Tools: Jupyter Notebook, Python, Scikit-Learn, NumPy, Pandas, Matplotlib/Seaborn, Machine Learning, Data Visualization, Streamlit Community Cloud.

Source Code

Ram Mopati

👋 Welcome to My Data Science Portfolio...

Hi, I'm Ram Mopati, An Aspiring Data Scientist,

Pursuing Master of Science in Data Science at Coventry University, United Kingdom.

About Me

Work Experience

Infosys Ltd (Company Role: Software Quality Engineer), Banglore, India.

Client: UK Insurance Project | Data Analyst

3+ years of experience | Feb 2022 - May 2025

Key Responsibilities:

Projects

Enterprise Customer Churn or Retention Analytics Platform

Business Impact Summary:

Technical Features:

Agricultural Disease Detection System

Business Impact Summary:

Technical Features:

Financial Risk Assessment & Credit Scoring Platform

Business Impact Summary:

Technical Feature:

Luxury Stones Market Valuation Engine

Business Impact Summary:

Technical Features:

Real Estate Investment Analytics System

Business Impact Summary:

Technical Features:

Clinical Decision Support System for Cancer Screening

Business Impact Summary:

Technical Features:

Educational Analytics & Performance Optimization System

Business Impact Summary:

Technical Features:

Resume

Contact Me