TMLS2020: Difference between revisions
Line 229: | Line 229: | ||
who: | who: | ||
* Xunyu Zhou (Speaker) Professor, Department of IEOR, Columbia University | * Xunyu Zhou (Speaker) Professor, Department of IEOR, Columbia University | ||
this was pretty dry, I skipped it for the next ... | |||
=== Productionizing Deep Learning Models at Scale === | |||
MLOps | |||
who: Dillon Erb (Speaker) CEO and Cofounder, Paperspace | |||
== Link dump == | == Link dump == |
Revision as of 16:14, 18 November 2020
Notes from chat channels
What are people working on?
Nov 16th
Workshops
Topic: Workshop: MLOps & Automation Workshop: Bringing ML to Production in a Few Easy Steps
Time: Nov 16, 2020 09:00 AM Eastern Time (US and Canada)
who: Yaron Haviv https://medium.com/@yaronhaviv
Tools:
- mlrun - lots of end to end demos /demos
- nuclio
- kubeflow
What it gives us:
- CICD for ML
- auditability
- drift
- feature store for meta data, and drift.
Topic: Reaching Lightspeed Data Science: ETL, ML, and Graph with NVIDIA RAPIDS
Time: Nov 16, 2020 11:00 AM Eastern Time (US and Canada)
https://github.com/dask/dask-tutorial
who: Bradley Rees ( from Nvidia )
https://medium.com/rapids-ai/gpu-dashboards-in-jupyter-lab-757b17aae1d5
https://nvidia.github.io/spark-rapids/
file formats:
h5 - for gentic info.
h5ad - compressed annotated version
nica lung
genetic analysis, 2d visualization
UMAP
- https://www.nature.com/articles/s41467-020-15351-4
- https://umap-learn.readthedocs.io/en/latest/basic_usage.html
Managing Data Science in the Enterprise
who:
- Randi J Ludwig , Sr. Manager Applied Data scientist - Dell Technologies
- Joshua Podulska - Chief Data scientist - Domino Data lab.
The age old “discipline” problem. This is not a data science problem and this is not a technology problem, this is a human problem.
"Paved paths"
kathy oneil weapons of math descruction
Nov 17th
Bonus Workshop: How to Automate Machine Learning With GitHub Actions
11:00 am
boring: what is docker? Skipped it.
Black-Box and Glass-Box Explanation in Machine Learning
who: Dave Scharbach who: rich cauruna
gradient boost decision tree
SHAP vales Shapley - the difference between the averge effect and the effect for the value.
linear model.
use "partial dependent plot " to see the effect of one input.
tree explainer for say GBM model.
two matrix: out matrix, and explainer matrix
output matrix has the same metrics, and explainer matrix has multple metrics.
bswarm plot
when you look at global summary stats, you wash away Rare high magnitude effect.
vertical dispersion.
interaction effect, agent and gender, there ar esome sublte bits,
we often see "treatment effects" if you BUN is higher than X then you get treatment Y, ergo the graph of BUN level has "notches" in it where treatments are triggered, drug .. dialysis, etc.
EBM - explainable booster machine - some in R and spark on the way.
Project: microsofts "interpret" package. EBM
https://github.com/interpretml/interpret
"How do you explain a model?"
also see SHAP package. https://github.com/slundberg/shap
shap.plots.waterfall() neato!
Nov 18th
Customer Segmentation, Pricing, and Profit Optimization for International Banking
who:
- Shirin Akbarinasaji (Speaker) Senior Data Scientist, Scotiabank
- Navid Kaihanirad (Speaker) Senior Data Scientist, Scotiabank
- Cheng Chen (Speaker) Data Scientist, Scotiabank
WTP Willingess to pay
price response function
9:25 am - 10:10 am
- Abstract
Background: Pricing is a famous business issue in many companies and organizations. The approach behind pricing analytics can be formulated as customer segmentation and constrained optimization problems in order to increase sales and/or revenue.
Aim: The main objective is to design a pricing product that can help to :
1) Identify groups of elastic and inelastic customers,
2) Determine the optimal rate for each group of customers,
3) Agnostic pipeline that can be reusable for other pricing use cases.
Methodology: Scotiabank proposes to use model-based recursive partitioning (MOB) which uses product characteristics and customer attributes as input and customer willingness to pay as output to segment customers. For each customer segmentation, the company found the demand curve function and formulate the nonlinear optimization problem that maximizes the sale or revenue using PYOMO and IPOPT.
Results: This pricing product has been used in three different countries: Peru, Columbia, and Mexico in various products such as mortgage, SPL, and term deposit with great feedback that has helped Scotiabank to capture international banking customer behavior and their price sensitivity more promptly.
Currently, this application is within the Bank’s international banking (IB) footprint, however, solutions are reuseable and scalable for application within the Canadian marketplace.
rough:
Q: how do you choose the objective within the Segmentation / Recursive Partitioning? How does that relate to the objective chosen in optimizing the demand function? Q: is there a damand fucntion per market segementation? A: we have a demand function for each segment. What tools / libraries: decision tree + max steps , mean size parametric model also there is an alpha function Tools: R and r py 2 to integration R with python pipeline Q: Are macro-economic variables (i.e. Unemployment rate, market volatility etc) incorporated in simulating Demand/Price estimation? If so, at what step in the pipeline is it incorporated? A: nope. not generally. external facters , macro economic : like employment / GDP A: we are using some , for creating an index rate. Q: How computationally heavy is "permuting over features/parameters" using grid search? Q: how do you hnadle a customers change in segmentation? A: we use retrain schedule. each campaign is "stand alone" . Q: how do you account for competitor pricing ? A:
Algorithmic Decision Making: Exploring Practical Approaches to Liability, Fairness, and Explainability
10am
who:
- Patrick Hall (Speaker) Principal Scientist, bnh.ai
- Talieh Tabatabaei (Speaker) Data Scientist, TD Bank
- Richard Zuroff (Speaker) Advisor, Element AI
references:
- Underspecification Presents Challenges for Credibility in Modern Machine Learning
- https://arxiv.org/pdf/2011.03395.pdf
https://ai.googleblog.com/2019/02/learning-to-generalize-from-sparse-and.html
Reinforcement Learning via Stochastic Control
who:
- Xunyu Zhou (Speaker) Professor, Department of IEOR, Columbia University
this was pretty dry, I skipped it for the next ...
Productionizing Deep Learning Models at Scale
MLOps who: Dillon Erb (Speaker) CEO and Cofounder, Paperspace