About Me

Sheldon Kemper – Data Engineer & Machine Learning Specialist

Welcome! I'm Sheldon Kemper, a Data Engineer with a deep passion for problem-solving. From an early age, I've been fascinated by uncovering solutions and optimising processes, which naturally led me to a career where I could transform data into actionable insights. I specialise in creating and optimising data solutions using Azure, Snowflake, and Databricks, focusing on building efficient data pipelines that empower businesses to streamline operations and make data-driven decisions with tangible impact.

My career journey has been one of evolution and overcoming challenges. Transitioning from a Risk Manager to a Business Insights Manager, and now a Data Engineer, I’ve adapted my skills to meet new demands and deliver results. This progression has not only expanded my technical abilities but also deepened my understanding of how data shapes business strategies and influences outcomes.

I recently completed the Data Science Career Accelerator at the University of Cambridge, further strengthening my expertise in machine learning and advanced analytics. My goal is to leverage these enhanced skills to provide deeper insights that help businesses not only understand their past actions but also shape future strategies more effectively. I’m driven by the belief that the right data, interpreted accurately, can transform decision-making and propel innovation.

I’m enthusiastic about collaborating on opportunities where data can make a meaningful impact. If you’re interested in discussing data engineering, machine learning, or innovative projects that harness the power of data, let’s connect!

Projects

  • Harnessing RAG and Agents for Financial Insight

    In this project, I delve into the transformative potential of Retrieval-Augmented Generation (RAG) and agent-based architectures within the financial services space. Leveraging large language models to retrieve targeted data in real time, combined with purpose-built AI agents, creates a framework that ensures robust data governance and real-world impact. By integrating these components into existing infrastructures, organizations unlock faster analytics, more dynamic decision-making, and enhanced scalability for critical financial applications. This forward-looking approach not only drives seamless data connectivity, but also enables meaningful insights that align with evolving industry demands.

    Predicting Student Dropout Rates with Machine Learning by Sheldon Kemper

    Harnessing RAG and Agents for Financial Insight by Sheldon Lee Kemper

    Read on Substack
  • How Time Series Forecasting Sharpens Book Sales Predictions

    In this project, I examine the power of time series forecasting to eliminate uncertainty and drive data-informed strategies. Through robust modeling techniques—ranging from classic ARIMA to deep learning frameworks—businesses can derive forward-looking insights that inform everything from resource allocation to market positioning. By embracing these predictive methodologies, organizations transcend reactive planning and gain the agility needed to navigate rapid industry shifts, ultimately delivering sustained value in an increasingly competitive landscape.

    Beyond Guesswork—How Time Series Forecasting Sharpens Book Sales Predictions

    Read on Substack
  • Predicting Student Dropout Rates with Machine Learning

    In this project, I applied machine learning models—XGBoost and a neural network—to predict student dropout rates. Using a dataset of over 25,000 student records, I performed data cleaning and feature selection to retain 11 key features, including attendance rates, contact hours, and unauthorised absences. The models were fine-tuned using techniques like GridSearchCV, and their performance was evaluated using metrics such as accuracy, precision, and recall. The neural network achieved an accuracy of 97.53%, slightly outperforming the XGBoost model. The analysis highlighted the importance of academic performance and engagement in predicting student success, providing actionable insights for educational institutions to intervene early and support at-risk students.

    Predicting Student Dropout Rates with Machine Learning by Sheldon Kemper

    Predicting Student Dropout Rates with Machine Learning by Sheldon Kemper

    A Deep Dive into XGBoost and Neural Networks

    Read on Substack
  • Detecting Anomalous Activity of a Ship's Engine

    This notebook explores anomaly detection in a ship's engine dataset using machine learning algorithms. Key methods include One-Class SVM and Isolation Forest to identify anomalous engine behaviour based on operational parameters such as engine RPM, lubrication oil pressure, and coolant temperature. The project involves preprocessing the data, feature engineering, and visualising results using techniques like PCA. Both One-Class SVM and Isolation Forest are compared, with tuning applied to improve model performance,ultimately aiming to enhance predictive maintenance and reduce the risk of engine failures.

    Detecting Anomalous Activity of a Ship's Engine by Sheldon Kemper

    Navigating the Waters of Predictive Maintenance with Data Science

    Read on Substack
  • Customer Segmentation with Clustering

    This notebook focuses on customer segmentation using machine learning algorithms. It processes an e-commerce dataset to create key features like frequency, recency, customer lifetime value (CLV), and average unit cost. The K-means clustering algorithm is used to group customers, with the optimal number of clusters determined using the Elbow Method and Silhouette Score. PCA (Principal Component Analysis) and t-SNE (t-Distributed Stochastic Neighbor Embedding) are applied for dimensionality reduction and cluster visualisation. The goal is to identify distinct customer segments to enhance marketing strategies.

    Harnessing Data to Uncover Customer Insights by Sheldon Kemper

    A Journey into Advanced Segmentation

    Read on Substack

Experience

  • Data Engineer at Capgemini

    February 2025 – Present, London Area, United Kingdom (Hybrid)

    • Design and build high-performance data pipelines using Databricks and Apache Spark to extract, transform, and load data into Azure Data Lake Storage and related Azure services.
    • Develop and maintain secure data warehouses and lakehouses by implementing robust data models, comprehensive data quality checks, and stringent governance practices, ensuring reliable and accurate information.
    • Build and deploy AI/ML models by integrating machine learning into data pipelines, leveraging Databricks ML and Azure ML to deliver predictive insights that drive actionable business value.
    • Monitor and optimize data pipelines and infrastructure by analyzing performance metrics, identifying bottlenecks, and implementing strategic improvements to enhance both efficiency and scalability.
    • Collaborate closely with cross-functional teams—including business analysts, data scientists, and DevOps engineers—to drive successful data platform implementations.
    • Engage in continuous learning and adaptation to the evolving landscape of big data technologies and best practices, ensuring delivery of innovative, future-proof solutions.
  • Data Engineer at The Go-Ahead Group

    November 2022 - February 2025

    • Engineered scalable data pipelines using Azure Data Factory, Snowflake, Databricks, and Spark, improving data processing efficiency by 40%.
    • Collaborated with stakeholders to deliver customised data solutions, reducing data retrieval times and enhancing overall operational efficiency.
    • Architected and maintained ETL processes, automating workflows to reduce errors by 30% and speeding up data delivery.
    • Implemented Azure Key Vault to enhance data security, reducing unauthorised access incidents by 25%.
    • Optimised data processing with Databricks and Spark, cutting query times by 60% and reducing cloud costs by 20%.
    • Led initiatives in data versioning and lineage tracking, improving traceability and reporting accuracy.
  • Commercial Insights Manager at The Go-Ahead Group

    August 2021 - November 2022

    • Led a team of four data analysts, enhancing their skills in data modelling and visualisation through advanced training in DAX and Power BI, resulting in a 30% boost in efficiency.
    • Developed over 20 advanced Power BI models that enhanced decision-making across Finance, Operations, and Marketing departments.
    • Collaborated with senior stakeholders to align data insights with strategic goals, fostering a stronger data-driven culture.
    • Improved existing Power BI models, cutting processing time by 50%, and provided training and support that increased user adoption by 80%.
    • Led cross-functional projects, improving group reporting efficiency by 25%.
  • Risk & Claims Manager at City of Oxford Bus Company

    June 2018 – January 2021

    • Led the risk management function, utilising data insights to develop strategic plans that reduced incident rates by 15%.
    • Managed the claims process with a data-driven approach, cutting processing time by 20% and improving settlement accuracy.
    • Ensured compliance in handling CCTV data, reducing fraudulent claims by 10%.
    • Spearheaded Power BI implementation for KPI reporting, creating interactive dashboards that enhanced decision-making and operational transparency.
    • Automated data reporting systems, reducing manual errors and decreasing report preparation time by 30%.
  • Freelance Web Developer

    2011 - 2016

    • Developed and deployed bespoke web applications using PHP, JavaScript, HTML, CSS, and Drupal, focusing on high performance, security, and scalability.
    • Collaborated with clients to manage end-to-end project lifecycles, ensuring responsive, user-centric design optimised for diverse devices and browsers.
    • Managed Apache server environments, implementing performance enhancements to maximise application speed and reliability.
    • Delivered a wide range of projects, including e-commerce platforms and content management systems.

Education

  • Data Science Career Accelerator

    University of Cambridge, June 2024 - March 2025

    A specialised programme aimed at equipping professionals with essential skills in data science and machine learning. The course covers core areas like data analytics, machine learning, neural networks, data visualisation, and big data management, preparing students to tackle real-world data-driven challenges.

  • BSc (Honours) Computing and IT

    Open University, 2015 – September 2020

    Provided a strong foundation in computing and IT, with a focus on software development, data science, and networks. Equipped with essential skills in problem-solving, programming, and system design, along with an understanding of ethical, social, and legal issues related to technology.

  • Advanced Diploma, System and Data Analysis

    University of Oxford, 2013

    Offered in-depth training in data management, system analysis, and statistical methodologies. Focused on interpreting complex data sets, optimising systems, and making data-driven decisions.