Data Science portfolio - the tip of the iceberg

My name is Romain Guion, and I am heading an engineering and data science department at a London-based startup called Vortexa. I previously worked as a data scientist with an entertainment Los Angeles-based startup called Pluto TV and with an AI-driven transportation startup called Padam. Prior to that I worked for 4 years in the healthcare industry as a consultant, project leader and scientist. I also briefly worked as a M&A analyst in the energy industry and as a rocket scientist.

I am also a graduate from the University of Cambridge, and I hold a MSc in Math and Physics from Ecole Centrale Paris. The maths covered advanced statistical learning, discete-time stochastic processes, optimization, signal processing, financial mathematics, as well as a range of linear algebra, analysis and topology topics.

Although Data Science is mainly about critical thinking, people like to talk about tools, so here it is:

All my professional work is confidential. My LinkedIn captures my professional journey. Until now, I never took the time to capture my side projects, so work in progress, and suggestions welcomed!

(Note: one bit of my work that is public are my patents)

Visuals first, text after


Tree predicting passenger survival on the TitanicLearning curve on the Titanic dataset. The gap between learning and training accuracy suggests a variance problem. K-meansUsing K-means for color compression Using top 36 eigenvectors (PCA)Pattern recognition on a Raspberry Pi using openCV and Tensorflow. Accessing and plotting stock data from PythonAccessing and plotting stock data from Python Mono-user movie recommendation systemAnomaly detection (project from Andrew Ng's course)Turning 10,000 lines of listing information into a plot representative of the booking experience. Tool:Python.Murders London MapAttenuation contourplot of an Acoustic Doppler Current ProfilerA/B/C test in progress - uncertainty estimationConversion rate - bayesian approachAnomaly detection on timeseries Deep neural network classifier labeling images as cat or non-cat Convolutional neural network classifier labeling sign language digits Implementation of Resnet50 in Keras Convolutional neural network identifying smiling YOLO algorithm applied on car images / videos Class activation mapping on vessel detection classifier ConvNet Ship image segmentation with a U-Net Class activation mapping on XRAY classifier ConvNet

YOLO algorithm applied on car images / videos SHAP model feature importance for a specific record

Trigger word recognition from audio sequence using Conv1D and GRU

Long projects (a few months)

Mid-size projects (a few days)

Mini projects (<= 1 day)

Math

Python

Most of the mini-projects that follow were pretty easy and with extensive guidance. However, they were nevertheless pretty insightful, as it helped bridge the gap between what I know in math and algorithmics and real world applications.

Java project

Java notes. While following some java courses, here are some key notes:

Scala. When I followed Martin Odersky’s EPFL course on functional programming in Scala and Spark. Some key learnings:

Data Science. While following online courses, I did quite a few mini-projects. Those were very simple and top-level, but a good excuse to get up to date with the popular python tools people use in data science today: Numpy, Pandas, Matplotlib, Seaborn, Plotly, SciKit Learn, Python SQL interface, SciPy, TensorFlow, NLTK, SHAP, lifelines etc. Here is a few for which I took some notes:

Matlab. When I followed Andrew Ng’s Stanford Machine Learning course, I did quite a few mini-projects that led me to write classic machine learning algorithms from first principle (algorithmics and matrix computations).

PySpark