Machine Learning
getting started
google://getting started with machine learning
https://www.kaggle.com/wiki/GettingStartedWithPythonForDataScience - in progress
https://www.quora.com/I-want-to-learn-machine-learning-Where-should-I-start
http://thunderboltlabs.com/blog/2013/11/09/getting-started-with-machine-learning/
http://machinelearningmastery.com/machine-learning-for-programmers/
https://www.kaggle.com/dfernig/reddit-comments-may-2015/the-biannual-reddit-sarcasm-hunt/code
course: at coursera https://www.coursera.org/learn/machine-learning/home/week/1
understanding machine learning theory algorithms
algorithms
- random forest
- https://medium.com/rants-on-machine-learning/the-unreasonable-effectiveness-of-random-forests-f33c3ce28883
- Nearest Neighbors Classification
- http://scikit-learn.org/stable/modules/neighbors.html
tools
python + libs
- SystemML- a Universal Translator for Big Data and Machine Learning
image labeling
https://github.com/Labelbox/Labelbox
TensorFlow Playground
http://playground.tensorflow.org
sample data
blogs
Cool Projects
https://github.com/aficnar/slackpolice
- Aerospace Controls Lab
- http://acl.mit.edu/
- https://www.youtube.com/channel/UCVTxuaJsdMrk3UEcHVll9Yg
Data leaks
When data associated iwth the data set gives away the target data.
Primarily of concern in competition.
Unexpected data.
refrence: https://www.coursera.org/learn/competitive-data-science/lecture/5w9Gy/basic-data-leaks
Future peaking - using time series data that's not in the target time period, for example in the future.
Meta data leaks - for example file meta data, zip file meta data, image file meta data.
information hidden in ID and hashes,
and information hidden in row order and possibly duplicate rows
Questions and Investigation
What are "ground truths"?
corteges - what is this word
/Courera's Competitive Data Science Course
Reading Room
- Detecting tanks https://www.jefftk.com/p/detecting-tanks
Kaggle competitions:
Past solutions
http://ndres.me/kaggle-past-solutions/ https://www.kaggle.com/wiki/PastSolutions http://www.chioka.in/kaggle-competition-solutions/ https://github.com/ShuaiW/kaggle-classification/
https://towardsdatascience.com/how-to-use-dataset-in-tensorflow-c758ef9e4428
https://towardsdatascience.com/how-to-train-neural-network-faster-with-optimizers-d297730b3713
NIPS - Neural Information Processing Systems
Demos and Labs
https://codelabs.developers.google.com/codelabs/scd-babyweight2/index.html#0
https://github.com/GoogleCloudPlatform/training-data-analyst
Chapter
https://github.com/FlorianMuellerklein/Machine-Learning
Improving our neural network (96% MNIST) https://databoys.github.io/ImprovingNN/
https://iamtrask.github.io/2015/07/12/basic-python-network/
https://plot.ly/python/create-online-dashboard/
https://www.anaconda.com/download/
http://jupyter.org/install.html
linear regression in 6 lines of code
source: https://towardsdatascience.com/linear-regression-in-6-lines-of-python-5e1d0cd05b8d
pip install scikit-learn
import numpy as np import matplotlib.pyplot as plt # To visualize import pandas as pd # To read data from sklearn.linear_model import LinearRegression
data = pd.read_csv('data.csv') # load data set X = data.iloc[:, 0].values.reshape(-1, 1) # values converts it into a numpy array Y = data.iloc[:, 1].values.reshape(-1, 1) # -1 means that calculate the dimension of rows, but have 1 column linear_regressor = LinearRegression() # create object for the class linear_regressor.fit(X, Y) # perform linear regression Y_pred = linear_regressor.predict(X) # make predictions
plt.scatter(X, Y) plt.plot(X, Y_pred, color='red') plt.show()