Machine Learning/scratch notes

From Federal Burro of Information
Jump to navigationJump to search
ML notes

Jupyter Notebook
Jump to navigationJump to search
jupyter notebook --no-browser --ip 0.0.0.0 --port 5000
jupyter notebook --no-browser --ip 0.0.0.0 --port 5000 --log-level=DEBUG

inline image:

possibly this:

%matplotlib inline
possibly this:

from IPython.display import Image
To Read
https://voila.readthedocs.io/en/stable/index.html


Machine Learning
Jump to navigationJump to search

Contents
1	getting started
2	algorithms
3	tools
4	sample data
5	blogs
6	Cool Projects
7	Data leaks
8	Questions and Investigation
9	Reading Room
9.1	NIPS - Neural Information Processing Systems
10	Demos and Labs
11	Image processing
12	Chapter
13	linear regression in 6 lines of code
getting started
google://getting started with machine learning

https://www.kaggle.com/wiki/GettingStartedWithPythonForDataScience - in progress

https://www.quora.com/I-want-to-learn-machine-learning-Where-should-I-start

http://thunderboltlabs.com/blog/2013/11/09/getting-started-with-machine-learning/

http://machinelearningmastery.com/machine-learning-for-programmers/

https://www.kaggle.com/dfernig/reddit-comments-may-2015/the-biannual-reddit-sarcasm-hunt/code

course: at coursera https://www.coursera.org/learn/machine-learning/home/week/1

understanding machine learning theory algorithms

algorithms
random forest
https://medium.com/rants-on-machine-learning/the-unreasonable-effectiveness-of-random-forests-f33c3ce28883
Nearest Neighbors Classification
http://scikit-learn.org/stable/modules/neighbors.html
lstm
http://blog.echen.me/2017/05/30/exploring-lstms/
tools
python + libs

Caffe deep learning framework
SystemML- a Universal Translator for Big Data and Machine Learning
http://www.ibm.com/blogs/think/2015/11/24/introducing-a-universal-translator-for-big-data-and-machine-learning/
https://github.com/SparkTC/systemml/
http://researcher.watson.ibm.com/researcher/view_group.php?id=3174
https://developer.ibm.com/open/systemml/
image labeling

https://github.com/Labelbox/Labelbox
TensorFlow Playground

http://playground.tensorflow.org
sample data
http://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions

blogs
http://blog.datumbox.com/

Cool Projects
https://github.com/aficnar/slackpolice


Aerospace Controls Lab
http://acl.mit.edu/
https://www.youtube.com/channel/UCVTxuaJsdMrk3UEcHVll9Yg
Data leaks
When data associated iwth the data set gives away the target data.

Primarily of concern in competition.

Unexpected data.

refrence: https://www.coursera.org/learn/competitive-data-science/lecture/5w9Gy/basic-data-leaks

Future peaking - using time series data that's not in the target time period, for example in the future.

Meta data leaks - for example file meta data, zip file meta data, image file meta data.

information hidden in ID and hashes,

and information hidden in row order and possibly duplicate rows

Questions and Investigation
What are "ground truths"?

corteges - what is this word

/Courera's Competitive Data Science Course

Reading Room
an good overview the the data science cycle in a general sense: https://cloud.google.com/ml-engine/docs/tensorflow/data-prep
What a Deep Neural Network thinks about your #selfie
Detecting tanks https://www.jefftk.com/p/detecting-tanks
https://analyticsdefined.com/mining-enron-emails/
https://www.coursera.org/learn/competitive-data-science/lecture/5w9Gy/basic-data-leaks
https://opendatascience.com/blog/
Kaggle competitions: https://www.kaggle.com/
University of Toronto Machine Learning http://www.learning.cs.toronto.edu/theses.html
Past solutions

http://ndres.me/kaggle-past-solutions/
https://www.kaggle.com/wiki/PastSolutions
http://www.chioka.in/kaggle-competition-solutions/
https://github.com/ShuaiW/kaggle-classification/
https://towardsdatascience.com/how-to-use-dataset-in-tensorflow-c758ef9e4428

https://towardsdatascience.com/how-to-train-neural-network-faster-with-optimizers-d297730b3713

NIPS - Neural Information Processing Systems
2015 https://nips.cc/Conferences/2015
2016 https://nips.cc/Conferences/2016
Demos and Labs
https://codelabs.developers.google.com/codelabs/scd-babyweight2/index.html#0

https://github.com/GoogleCloudPlatform/training-data-analyst

Jaz Quick start
use your GPU / TPU for ML:
https://jax.readthedocs.io/en/latest/notebooks/quickstart.html
https://github.com/cbrownley/foundations-for-analytics-with-python
Image processing
Christopheraburns / gluoncv-yolo-playing_cards
https://github.com/Christopheraburns/gluoncv-yolo-playing_cards/blob/master/Yolov3.ipynb
Chapter
https://github.com/FlorianMuellerklein/Machine-Learning

Improving our neural network (96% MNIST) https://databoys.github.io/ImprovingNN/

https://iamtrask.github.io/2015/07/12/basic-python-network/

https://plot.ly/python/create-online-dashboard/

https://www.anaconda.com/download/

http://jupyter.org/install.html

https://medium.com/towards-data-science/the-mostly-complete-chart-of-neural-networks-explained-3fb6f2367464

linear regression in 6 lines of code
source: https://towardsdatascience.com/linear-regression-in-6-lines-of-python-5e1d0cd05b8d

 pip install scikit-learn

<hr>

<pre>
import numpy as np
import matplotlib.pyplot as plt  # To visualize
import pandas as pd  # To read data
from sklearn.linear_model import LinearRegression
data = pd.read_csv('data.csv')  # load data set
X = data.iloc[:, 0].values.reshape(-1, 1)  # values converts it into a numpy array
Y = data.iloc[:, 1].values.reshape(-1, 1)  # -1 means that calculate the dimension of rows, but have 1 column
linear_regressor = LinearRegression()  # create object for the class
linear_regressor.fit(X, Y)  # perform linear regression
Y_pred = linear_regressor.predict(X)  # make predictions
plt.scatter(X, Y)
plt.plot(X, Y_pred, color='red')
plt.show()


D3 visualizations in jupyter notebooks: https://medium.com/@stallonejacob/d3-in-juypter-notebook-685d6dca75c8