Machine Learning

From Federal Burro of Information
Jump to: navigation, search

getting started

google://getting started with machine learning - in progress

course: at coursera

understanding machine learning theory algorithms


random forest
Nearest Neighbors Classification


python + libs

image labeling

TensorFlow Playground

sample data


Cool Projects

Aerospace Controls Lab

Data leaks

When data associated iwth the data set gives away the target data.

Primarily of concern in competition.

Unexpected data.


Future peaking - using time series data that's not in the target time period, for example in the future.

Meta data leaks - for example file meta data, zip file meta data, image file meta data.

information hidden in ID and hashes,

and information hidden in row order and possibly duplicate rows

Questions and Investigation

What are "ground truths"?

corteges - what is this word

/Courera's Competitive Data Science Course

Reading Room

Kaggle competitions:

Past solutions

NIPS - Neural Information Processing Systems

Demos and Labs


Improving our neural network (96% MNIST)

linear regression in 6 lines of code


pip install scikit-learn
import numpy as np
import matplotlib.pyplot as plt  # To visualize
import pandas as pd  # To read data
from sklearn.linear_model import LinearRegression
data = pd.read_csv('data.csv')  # load data set
X = data.iloc[:, 0].values.reshape(-1, 1)  # values converts it into a numpy array
Y = data.iloc[:, 1].values.reshape(-1, 1)  # -1 means that calculate the dimension of rows, but have 1 column
linear_regressor = LinearRegression()  # create object for the class, Y)  # perform linear regression
Y_pred = linear_regressor.predict(X)  # make predictions
plt.scatter(X, Y)
plt.plot(X, Y_pred, color='red')