Machine Learning
getting started
google://getting started with machine learning
https://www.kaggle.com/wiki/GettingStartedWithPythonForDataScience - in progress
https://www.quora.com/I-want-to-learn-machine-learning-Where-should-I-start
http://thunderboltlabs.com/blog/2013/11/09/getting-started-with-machine-learning/
http://machinelearningmastery.com/machine-learning-for-programmers/
https://www.kaggle.com/dfernig/reddit-comments-may-2015/the-biannual-reddit-sarcasm-hunt/code
course: at coursera https://www.coursera.org/learn/machine-learning/home/week/1
understanding machine learning theory algorithms
algorithms
- random forest
- https://medium.com/rants-on-machine-learning/the-unreasonable-effectiveness-of-random-forests-f33c3ce28883
- Nearest Neighbors Classification
- http://scikit-learn.org/stable/modules/neighbors.html
tools
python + libs
- SystemML- a Universal Translator for Big Data and Machine Learning
sample data
blogs
Cool Projects
https://github.com/aficnar/slackpolice
- Aerospace Controls Lab
- http://acl.mit.edu/
- https://www.youtube.com/channel/UCVTxuaJsdMrk3UEcHVll9Yg
Data leaks
When data associated iwth the data set gives away the target data.
Primarily of concern in competition.
Unexpected data.
refrence: https://www.coursera.org/learn/competitive-data-science/lecture/5w9Gy/basic-data-leaks
Future peaking - using time series data that's not in the target time period, for example in the future.
Meta data leaks - for example file meta data, zip file meta data, image file meta data.
information hidden in ID and hashes,
and information hidden in row order and possibly duplicate rows
Reading Room
- Detecting tanks https://www.jefftk.com/p/detecting-tanks
Chapter
https://github.com/FlorianMuellerklein/Machine-Learning
Improving our neural network (96% MNIST) https://databoys.github.io/ImprovingNN/
https://iamtrask.github.io/2015/07/12/basic-python-network/
https://plot.ly/python/create-online-dashboard/
https://www.anaconda.com/download/