Alex Walsh Data Scientist | Problem Solver | Science Enthusiast

My Expertise

As a data scientist with a background in physics, I have experience exploring and scrutinizing data. I have strong math and programming skills, as well as a passion for learning new things and techniques. My deep understanding of python, exploratory data analysis, and machine learning modeling will allow me to make excellent contributions to any team. I love coming to a deeper understanding of the nature of a dataset, and I believe it to be my ethical duty to present the data accurately.

I have years of experience with the core data-centric libraries in python, such as NumPy, Pandas, Scikit-Learn, and Tensorflow. This experience comes from my education, coding as a hobby, as well as a professional data science certification by General Assembly. I look forward to the years to come of developing my data science career, honing my coding skills, and keeping up with new technologies and breakthroughs in the world of machine learning.

Code

I have experience with important data science languages such as python, postgreSQL, and Apache Spark. I am also familiar with AMW, and I have some C++ and shell scripting experience

Analyze

My preferred analysis libraries are Scikit-Learn and Tensorflow, with NumPy and Pandas for statistical analysis. I may supplement these libraries with SciPy and Statsmodels.

Understand

Visualization is a crucial part of understanding data. I approach this with standard Matplotlib and/or Seaborn within python, or I use Tableau for more advanced visualization.

Featured Projects

mountains

Gasoline Price Prediction

  • Time Series Analysis, Deep Learning, ARIMA, Linear Forecasting

A time series forecasting project to predict the price of gasoline in the United States. This project included both numerical data, such as oil prices, as well as text data from relevant news publishings. The model was trained on data prior to 2019, and projections were made for the years 2019-2023 with an average error of $0.035.

Check it out
mountains

Credit Card Fraud Detection

  • Data Simulation, Feature Engineering, Anomaly Detection, Oversampling, SMOTE, Git Management

A project in anomaly detection to identify cases of fraudulent purchases in simulated credit card data. Despite the overwhelming majority of the data being non-fraudulent, the feature engineering and modeling techniques used in this project were able to achieve a recall of 97%.

Check it out
mountains

Satire Detection in News Articles

  • Natural Language Processing, Tokenization, Lemmatization, Random Forests, Stacked Classifiers, Web Scraping

My first attempt at language classification, this project gathered news articles posted to the reddit communities r/TheOnion and r/nottheonion. I built a custom web scraper to gather my data, and after cleaning it I removed words which appeared disproportionately in either category. My model was able to classify the two types of text with an accuracy of 95.4% compared to a 50% baseline.

Check it out
mountains

Regression Analysis of Ames Housing Data

  • OLS Linear Regression, Lasso, Ridge, Preprocessing

This project was an attempt to model the price of homes sold in Ames, Iowa. My analysis was limited to EDA and a few flavors of linear regression. My model was able to achieve an RMSE of $14,401.

Check it out
mountains

Exploratory Data Analysis of Educational Trends

  • EDA, Pandas, Numpy, Matplotlib, Seaborn

In this exploratory analysis project, I was tasked with coming to conclusions about trends in education performance in the United States as a whole, as well as in my local city. The data used was primarily standardized test scores, with my conclusions being influenced by some expert statements.

Check it out