TMLS2020: Difference between revisions

Latest revision as of 15:29, 4 December 2020

Overview

Session list: https://torontomachinelearning.com/2020-conference/

Slack servers:

/TMLS2020 Youtube Playlist

Tools

how did they make the conf go?

booking: eventbrite

sessions: hopin

community ( chat / agenda / Q&A ): whova

Notes from chat channels

"gradient accumulation"

"I'm a big fan of Tibshirani's work with L1 and L2 regularization."

What are people working on?

https://medium.com/alphabyte-research-lab/tracking-parliament-with-machine-learning-part-1-background-35655ac91bca

Nov 16th

Workshops

Topic: Workshop: MLOps & Automation Workshop: Bringing ML to Production in a Few Easy Steps

Time: Nov 16, 2020 09:00 AM Eastern Time (US and Canada)

who: Yaron Haviv https://medium.com/@yaronhaviv

Tools:

mlrun - lots of end to end demos /demos
nuclio
kubeflow

What it gives us:

CICD for ML
auditability
drift
feature store for meta data, and drift.

Topic: Reaching Lightspeed Data Science: ETL, ML, and Graph with NVIDIA RAPIDS

Time: Nov 16, 2020 11:00 AM Eastern Time (US and Canada)

https://github.com/dask/dask-tutorial

who: Bradley Rees ( from Nvidia )

A previous version of this same presentation by someone else at NVidia https://www.youtube.com/watch?v=GVUA3vSPzio

https://medium.com/rapids-ai/gpu-dashboards-in-jupyter-lab-757b17aae1d5

https://nvidia.github.io/spark-rapids/

file formats:

h5 - for gentic info.

h5ad - compressed annotated version

nica lung

https://github.com/rapidsai-community/notebooks-contrib/blob/branch-0.14/conference_notebooks/KDD_2020/notebooks/Lungs/hlca_lung_gpu_analysis.ipynb

genetic analysis, 2d visualization

UMAP

Managing Data Science in the Enterprise

who:

Randi J Ludwig , Sr. Manager Applied Data scientist - Dell Technologies
Joshua Podulska - Chief Data scientist - Domino Data lab.

The age old “discipline” problem. This is not a data science problem and this is not a technology problem, this is a human problem.

"Paved paths"

kathy oneil weapons of math descruction

Nov 17th

Bonus Workshop: How to Automate Machine Learning With GitHub Actions

11:00 am

boring: what is docker? Skipped it.

Black-Box and Glass-Box Explanation in Machine Learning

who: Dave Scharbach who: rich cauruna

gradient boost decision tree

SHAP vales Shapley - the difference between the averge effect and the effect for the value.

linear model.

use "partial dependent plot " to see the effect of one input.

tree explainer for say GBM model.

two matrix: out matrix, and explainer matrix

output matrix has the same metrics, and explainer matrix has multple metrics.

bswarm plot

when you look at global summary stats, you wash away Rare high magnitude effect.

vertical dispersion.

interaction effect, agent and gender, there ar esome sublte bits,

we often see "treatment effects" if you BUN is higher than X then you get treatment Y, ergo the graph of BUN level has "notches" in it where treatments are triggered, drug .. dialysis, etc.

EBM - explainable booster machine - some in R and spark on the way.

Project: microsofts "interpret" package. EBM

https://github.com/interpretml/interpret

"How do you explain a model?"

also see SHAP package. https://github.com/slundberg/shap

shap.plots.waterfall() neato!

Nov 18th

Customer Segmentation, Pricing, and Profit Optimization for International Banking

who:

Shirin Akbarinasaji (Speaker) Senior Data Scientist, Scotiabank
Navid Kaihanirad (Speaker) Senior Data Scientist, Scotiabank
Cheng Chen (Speaker) Data Scientist, Scotiabank

WTP Willingess to pay

price response function

9:25 am - 10:10 am

Abstract

Background: Pricing is a famous business issue in many companies and organizations. The approach behind pricing analytics can be formulated as customer segmentation and constrained optimization problems in order to increase sales and/or revenue.

Aim: The main objective is to design a pricing product that can help to :

1) Identify groups of elastic and inelastic customers,

2) Determine the optimal rate for each group of customers,

3) Agnostic pipeline that can be reusable for other pricing use cases.

Methodology: Scotiabank proposes to use model-based recursive partitioning (MOB) which uses product characteristics and customer attributes as input and customer willingness to pay as output to segment customers. For each customer segmentation, the company found the demand curve function and formulate the nonlinear optimization problem that maximizes the sale or revenue using PYOMO and IPOPT.

Results: This pricing product has been used in three different countries: Peru, Columbia, and Mexico in various products such as mortgage, SPL, and term deposit with great feedback that has helped Scotiabank to capture international banking customer behavior and their price sensitivity more promptly.

Currently, this application is within the Bank’s international banking (IB) footprint, however, solutions are reuseable and scalable for application within the Canadian marketplace.

rough:

Q: how do you choose the objective within the Segmentation / Recursive Partitioning? How does that relate to the objective chosen in optimizing the demand function?

Q: is there a damand fucntion per market segementation?
A: we have a demand function for each segment.

What tools / libraries:

decision tree + 
max steps , mean size 
parametric model
also there is an alpha function

Tools: 
R and r py 2 to integration R with python pipeline 

Q: Are macro-economic variables (i.e. Unemployment rate, market volatility etc) incorporated in simulating Demand/Price estimation? If so, at what step in the pipeline is it incorporated?
A: nope. not generally.

external facters , macro economic : like employment / GDP 
A: we are using some , for creating an index rate. 

Q: How computationally heavy is "permuting over features/parameters" using grid search?


Q: how do you hnadle a customers change in segmentation?

A: we use retrain schedule.  each campaign is "stand alone" .

Q: how do you account for competitor pricing ?

A:

Algorithmic Decision Making: Exploring Practical Approaches to Liability, Fairness, and Explainability

10am

who:

Patrick Hall (Speaker) Principal Scientist, bnh.ai
Talieh Tabatabaei (Speaker) Data Scientist, TD Bank
Richard Zuroff (Speaker) Advisor, Element AI

references:

Underspecification Presents Challenges for Credibility in Modern Machine Learning
https://arxiv.org/pdf/2011.03395.pdf

https://ai.googleblog.com/2019/02/learning-to-generalize-from-sparse-and.html

Reinforcement Learning via Stochastic Control

who:

Xunyu Zhou (Speaker) Professor, Department of IEOR, Columbia University

this was pretty dry, I skipped it for the next ...

Productionizing Deep Learning Models at Scale

MLOps who: Dillon Erb (Speaker) CEO and Cofounder, Paperspace

"Machine learning is a software engineering practice"

The Role of ML in Climate Change

who: Sedef Akinli Kocak I think this is our responsibility to address the potential negative impact from the 2nd and 3rd-order effects including rebound effect of the systems we design as part of our design process, even if these are not readily quantifiable.

who:

Sasha Luccioni
Arthur Berrill
Jules Andrew
Patricia Thaine

Notes:

making your algos eat less energy . calculate carbon footprint.
getting green compute.
examples: using ML to make things that consume resources / or have a carbon footprint more efficient.

https://www.climatechange.ai/summaries

https://chrome.google.com/webstore/detail/neutral/oagdejngkgbfnaankoehhanicaodcdpd?hl=en

code carbon - https://github.com/mlco2/codecarbon

A Cookbook for Deep Continuous-Time Predictive Models

who: David Duvenaud

http://www.cs.toronto.edu/~duvenaud/

https://duvenaud.github.io/sta414/lec8-autodiff.pdf

irregularly time data set , how to hanlde without binning

meet the data where it is.

kalman filters - not so good, gaussian processes.

Kaplan–Meier

ordinary differential equations

stochastic grading descent

stochastic differential equations

gaussian noise - go read about this.

Nov 19th

Applied ML in Healthcare - Practical & Legal Considerations

Speakers

Mary Jane Dykeman (Speaker) Partner & Co-Founder, INQ Data Law Muhammad Mamdani, PharmD, MA, MPH (Speaker) Vice President, Data Science and Advanced Analytics, Unity Health Toronto Description

Abstract: Applied machine learning has the potential to transform healthcare, particularly in the areas of automation, prediction, and optimization. However, numerous challenges to the acquisition, storage, and utilization of data as well as the development of practical machine learning algorithms and change management principles need to be considered. This talk will provide an overview of the process of applying ML into healthcare and the legal and ethical considerations needed for data access and application.

What You Will Learn: Attendees will gain an understanding of the principles of knowledge translation in applied machine learning in healthcare and understand issues related to privacy and ethics as well as legal considerations.

st. micheal's hospital 
can we predict who will die earlier ?
answer: monitor staff and patients, to predict death.
ethical : act or not act ? controls?

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead

Advanced Research

who: Cynthia Rudin (Speaker) Professor of Computer Science, Electrical and Computer Engineering and Statistical Science, Duke University

quote: "I've never met a data set I can trust."

https://www.nature.com/articles/s41598-019-39071-y

Harnessing the Power of NLP: A Vector Institute Industry

vector institute

Neural-Symbolic AI for Creativity, Generalization and Transfer Learning

Advanced Research

who: Ben Goertzel (Speaker) SingularityNET, CEO

"make vector of node in graph , and then do vector math"

man - women = king - queen

Link dump

https://hbr.org/2019/10/what-do-we-do-about-the-biases-in-ai

https://blog.dominodatalab.com/measuring-data-science-business-value/

https://www2.slideshare.net/MiguelFierro1/knowledge-graph-recommendation-systems-for-covid19

TMLS2020: Difference between revisions

Latest revision as of 15:29, 4 December 2020

Contents

Overview

Tools

Notes from chat channels

What are people working on?

Nov 16th

Topic: Workshop: MLOps & Automation Workshop: Bringing ML to Production in a Few Easy Steps

Topic: Reaching Lightspeed Data Science: ETL, ML, and Graph with NVIDIA RAPIDS

Managing Data Science in the Enterprise

Nov 17th

Bonus Workshop: How to Automate Machine Learning With GitHub Actions

Black-Box and Glass-Box Explanation in Machine Learning

Nov 18th

Customer Segmentation, Pricing, and Profit Optimization for International Banking

Algorithmic Decision Making: Exploring Practical Approaches to Liability, Fairness, and Explainability

Reinforcement Learning via Stochastic Control

Productionizing Deep Learning Models at Scale

The Role of ML in Climate Change

A Cookbook for Deep Continuous-Time Predictive Models

Nov 19th

Applied ML in Healthcare - Practical & Legal Considerations

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead

Harnessing the Power of NLP: A Vector Institute Industry

Neural-Symbolic AI for Creativity, Generalization and Transfer Learning

Link dump

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

@@ Line 1: / Line 1: @@
+== Overview ==
+Session list: https://torontomachinelearning.com/2020-conference/
+Slack servers:
+* https://mlopslive.slack.com
+* https://tmls.slack.com
+[[/TMLS2020 Youtube Playlist]]
+== Tools ==
+how did they make the conf go?
+booking: eventbrite
+sessions: hopin
+community ( chat / agenda / Q&A ): whova
 == Notes from chat channels ==
-=== What are people working on? ===
+"gradient accumulation"
-https://medium.com/alphabyte-research-lab/tracking-parliament-with-machine-learning-part-1-background-35655ac91bca
+"I'm a big fan of Tibshirani's work with L1 and L2 regularization."
+=== What are people working on? ===
+https://medium.com/alphabyte-research-lab/tracking-parliament-with-machine-learning-part-1-background-35655ac91bca
 == Nov 16th ==
@@ Line 43: / Line 66: @@
 who: Bradley Rees ( from Nvidia )
+A previous version of this same presentation by someone else at NVidia https://www.youtube.com/watch?v=GVUA3vSPzio
 [[image:tmls2020-nvidia-rapids.png|600px]]
@@ Line 96: / Line 120: @@
 who: Dave Scharbach
+who: rich cauruna
 gradient boost decision tree
@@ Line 115: / Line 140: @@
 bswarm plot
+when you look at global summary stats, you wash away Rare high magnitude effect.
+vertical dispersion.
+interaction effect, agent and gender, there ar esome sublte bits,
+we often see "treatment effects" if you BUN is higher than X then you get treatment Y, ergo the graph of BUN level has "notches" in it where treatments are triggered, drug .. dialysis, etc.
+EBM - explainable booster machine - some in R and spark on the way.
+Project: microsofts "interpret" package. EBM
+ https://github.com/interpretml/interpret
+"How do you explain a model?"
+also see SHAP package. https://github.com/slundberg/shap
+shap.plots.waterfall() neato!
+== Nov 18th ==
+=== Customer Segmentation, Pricing, and Profit Optimization for International Banking ===
+who:
+* Shirin Akbarinasaji (Speaker) Senior Data Scientist, Scotiabank
+* Navid Kaihanirad (Speaker) Senior Data Scientist, Scotiabank
+* Cheng Chen (Speaker) Data Scientist, Scotiabank
+WTP Willingess to pay
+price response function
+:25 am - 10:10 am
+; Abstract:
+Background: Pricing is a famous business issue in many companies and organizations. The approach behind pricing analytics can be formulated as customer segmentation and constrained optimization problems in order to increase sales and/or revenue.
+Aim: The main objective is to design a pricing product that can help to :
+) Identify groups of elastic and inelastic customers,
+) Determine the optimal rate for each group of customers,
+) Agnostic pipeline that can be reusable for other pricing use cases.
+Methodology: Scotiabank proposes to use model-based recursive partitioning (MOB) which uses product characteristics and customer attributes as input and customer willingness to pay as output to segment customers. For each customer segmentation, the company found the demand curve function and formulate the nonlinear optimization problem that maximizes the sale or revenue using PYOMO and IPOPT.
+Results: This pricing product has been used in three different countries: Peru, Columbia, and Mexico in various products such as mortgage, SPL, and term deposit with great feedback that has helped Scotiabank to capture international banking customer behavior and their price sensitivity more promptly.
+Currently, this application is within the Bank’s international banking (IB) footprint, however, solutions are reuseable and scalable for application within the Canadian marketplace.
+rough:
+<pre>
+Q: how do you choose the objective within the Segmentation / Recursive Partitioning? How does that relate to the objective chosen in optimizing the demand function?
+Q: is there a damand fucntion per market segementation?
+A: we have a demand function for each segment.
+What tools / libraries:
+decision tree +
+max steps , mean size
+parametric model
+also there is an alpha function
+Tools:
+R and r py 2 to integration R with python pipeline
+Q: Are macro-economic variables (i.e. Unemployment rate, market volatility etc) incorporated in simulating Demand/Price estimation? If so, at what step in the pipeline is it incorporated?
+A: nope. not generally.
+external facters , macro economic : like employment / GDP
+A: we are using some , for creating an index rate.
+Q: How computationally heavy is "permuting over features/parameters" using grid search?
+Q: how do you hnadle a customers change in segmentation?
+A: we use retrain schedule.  each campaign is "stand alone" .
+Q: how do you account for competitor pricing ?
+A:
+</pre>
+=== Algorithmic Decision Making: Exploring Practical Approaches to Liability, Fairness, and Explainability ===
+am
+who:
+* Patrick Hall (Speaker) Principal Scientist, bnh.ai
+* Talieh Tabatabaei (Speaker) Data Scientist, TD Bank
+* Richard Zuroff (Speaker) Advisor, Element AI
+references:
+:Underspecification Presents Challenges for Credibility in Modern Machine Learning
+;https://arxiv.org/pdf/2011.03395.pdf
+https://ai.googleblog.com/2019/02/learning-to-generalize-from-sparse-and.html
+=== Reinforcement Learning via Stochastic Control ===
+who:
+* Xunyu Zhou (Speaker) Professor, Department of IEOR, Columbia University
+this was pretty dry, I skipped it for the next ...
+=== Productionizing Deep Learning Models at Scale ===
+MLOps
+who: Dillon Erb (Speaker) CEO and Cofounder, Paperspace
+"Machine learning is a software engineering practice"
+=== The Role of ML in Climate Change ===
+who: Sedef Akinli Kocak
+I think this is our responsibility to address the potential negative impact from the 2nd and 3rd-order effects including rebound effect of the systems we design as part of our design process, even if these are not readily quantifiable.
+who:
+* Sasha Luccioni
+* Arthur Berrill
+* Jules Andrew
+* Patricia Thaine
+Notes:
+* making your algos eat less energy . calculate carbon footprint.
+* getting green compute.
+* examples: using ML to make things that consume resources / or have a carbon footprint more efficient.
+https://www.climatechange.ai/summaries
+https://chrome.google.com/webstore/detail/neutral/oagdejngkgbfnaankoehhanicaodcdpd?hl=en
+code carbon - https://github.com/mlco2/codecarbon
+=== A Cookbook for Deep Continuous-Time Predictive Models ===
+who: David Duvenaud
+http://www.cs.toronto.edu/~duvenaud/
+https://duvenaud.github.io/sta414/lec8-autodiff.pdf
+irregularly time data set , how to hanlde without binning
+meet the data where it is.
+kalman filters - not so good, gaussian processes.
+Kaplan–Meier
+ordinary differential equations
+stochastic grading descent
+stochastic differential equations
+gaussian noise - go read about this.
+== Nov 19th ==
+=== Applied ML in Healthcare - Practical & Legal Considerations ===
+Speakers
-when you look at global summary stats, you wash away Rare high magnitude effect.
+Mary Jane Dykeman (Speaker) Partner & Co-Founder, INQ Data Law
+Muhammad Mamdani, PharmD, MA, MPH (Speaker) Vice President, Data Science and Advanced Analytics, Unity Health Toronto
+Description
+Abstract:
+Applied machine learning has the potential to transform healthcare, particularly in the areas of automation, prediction, and optimization. However, numerous challenges to the acquisition, storage, and utilization of data as well as the development of practical machine learning algorithms and change management principles need to be considered. This talk will provide an overview of the process of applying ML into healthcare and the legal and ethical considerations needed for data access and application.
+What You Will Learn:
+Attendees will gain an understanding of the principles of knowledge translation in applied machine learning in healthcare and understand issues related to privacy and ethics as well as legal considerations.
+<pre>
+st. micheal's hospital
+can we predict who will die earlier ?
+answer: monitor staff and patients, to predict death.
+ethical : act or not act ? controls?
+</pre>
+===  Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead ===
+Advanced Research
+who: Cynthia Rudin (Speaker) Professor of Computer Science, Electrical and Computer Engineering and Statistical Science, Duke University
+quote: "I've never met a data set I can trust."
+https://www.nature.com/articles/s41598-019-39071-y
+=== Harnessing the Power of NLP: A Vector Institute Industry  ===
+vector institute
+=== Neural-Symbolic AI for Creativity, Generalization and Transfer Learning ===
+Advanced Research
+who: Ben Goertzel (Speaker) SingularityNET, CEO
+"make vector of node in graph , and then do vector math"
+ man - women = king - queen
 == Link dump  ==
@@ Line 125: / Line 370: @@
 * https://www2.slideshare.net/MiguelFierro1/knowledge-graph-recommendation-systems-for-covid19
+* https://rajpurkar.github.io/SQuAD-explorer/
+* https://worksheets.codalab.org/worksheets/0x62eefc3e64e04430a1a24785a9293fff/