Learn, Imagine, Build
Geoff Messier's Projects & Ideas
The purpose of this section is to give students who are brand new to our group something to look at to get up to speed. Here, I’m trying to strike a balance between giving you the fundamentals that everyone should know and not spending too much time exploring techniques that you might not use in your specific project.
Everything in this section is really good but it may or may not be useful for you, depending on the focus of your project.
Part II of Ian Goodfellow’s Deep Learning book is a good introduction into neural networks. Michael Nielsen’s book also gives an accessible introduction into neural networks.
The two libraries we most often use for neural network coding are Keras and scikit-learn. There are so many tutorials for both these libraries if you google them but Machine Learning Mastery (see below) is a good place to start for Keras.
Machine Learning Mastery is an excellent website for all things machine learning but particularly deep learning. The site also has some good tutorials for getting started with Keras.
EDA (introduced in Chapter 4 of Seltman’s book) is a very important and often overlooked aspect of machine learning and data analysis. In order for your algorithm to produce good results, you must first understand the nature of your data and see if it contains any obvious inconsistencies or errors. Exploratory data analysis (EDA) is essentially using relatively straightforward plots and statistical quantities to determine this.
There are a variety of metrics used to evaluate the performance of a classification algorithm. I have some notes here that expand on this topic.
Much of our work is using machine learning algorithms to predict adverse future outcomes using features that have been accumulated in an individual’s data record. A related technique commonly used by medical researchers and bio-statiticians is survival analysis. Survival analysis looks for features in the data that increase the risk of an adverse outcome. It makes a series of very specific assumptions about the time to the occurence of these outcomes and whether they are censored by the end date of a study.
Some background reading: