Portfolio - Shivam Agarwal

About Me

Hi there! I'm Shivam Agarwal, a passionate Data enthusiast and recent Master of Science in Data Science (MSDS) graduate from the University of Washington, Seattle, with a background in Machine Learning, Data Engineering, and Cloud Computing.

With over three years of experience as a Software Engineer III at Deloitte and a recent Data Engineering Internship at Amazon, I specialize in designing scalable AI-driven data solutions and optimizing complex data pipelines.

At Amazon, I built predictive analytics tools and optimized resource allocation, saving $200K annually. At Deloitte, I led multiple GenAI projects, engineered high-efficiency ETL workflows, and developed cloud-native data pipelines across AWS, GCP, and Azure. I also hold certifications in AWS Data Analytics Specialty and Generative AI Fundamentals from Databricks.

Beyond work, I actively contribute to open-source projects like Airbyte and TorchFix by Meta. I also love competing in hackathons and experimenting with cutting-edge AI technologies.

When I'm not coding, you'll find me exploring new music, perfecting my coffee routine, or working on innovative side projects.

I'm currently seeking full-time opportunities starting March 2025.

Feel free to reach out if you'd like to connect, collaborate, or just chat about data!

Experience

Software Engineer III – Deloitte (2020-2023)

Designed and implemented AI-driven solutions, leveraging models like GPT and DALL-E to enhance automation and analytics across business units.

Optimized data pipelines, improving processing efficiency by 30% and reducing workflow execution time by 73% through advanced validation techniques.

Developed a large-scale LLM-powered code correction tool, achieving 93% accuracy in automated Python code fixes.

Consultant (Part-time) – Eaglytics Co. (2021-2023)

Engineered cost-efficient, scalable data pipelines on GCP, reducing annual processing costs by $18,000.

Integrated APIs from HubSpot, Google Ads, and Airtable to manage 15TB+ of data, improving data retrieval speed by 35%.

Developed high-impact dashboards, enhancing data accessibility and readability by 60%.

Data Engineering Intern – Amazon (2024)

Managed high-scale recruitment data pipelines processing over 680M+ records using AWS services like Lake Formation, Glue, and Athena.

Designed AWS Quicksight dashboards serving 1,000+ users, leading to $200,000 in cost savings through data-driven insights.

Built predictive analytics pipelines, optimizing project ROI tracking and improving ML model performance by 6%.

Projects

Spotify Playlist Continuation

In this project, I aimed to enhance music recommendations by predicting song continuations for Spotify playlists using the Spotify Million Playlist Dataset (MPD). I leveraged state-of-the-art machine learning techniques like content-based filtering and collaborative filtering. I processed 1 million playlists and 2.26 million unique tracks, optimizing the data pipeline using PySpark to reduce processing time by 53%.

Tech Used: Python, PySpark, Machine Learning Alogrithms, SQL

Spotify Audio Explorer

This project visualizes music consumption patterns using the Spotify Million Playlist Dataset and Spotify's API. I processed 32GB of playlist data, merging it with additional track and artist details via the Spotify API. Python was used for data cleaning and transformation, utilizing libraries such as Pandas and Spotipy. The dataset was optimized using PySpark for efficient handling, and the final analysis included over 134,000 unique tracks and 14,796 artists. Tableau was employed for dashboard development, showcasing key audio features like energy, danceability, and valence.

Tech Used: PySpark, Pandas, Spotify API, Tableau

AI-Driven Stock Predictor

Developed a machine learning model to predict stock market trends using historical data and sentiment analysis. The project leveraged NLP techniques to analyze news articles and correlate sentiment with stock price movements, along with technical indicators like MACD and RSI for enhanced forecasting accuracy.

Tech Used: Python, Natural Language Processing (NLP), Generative AI (OpenAI API)