Machine Learning/Courera's Competitive Data Science Course

From Federal Burro of Information
Revision as of 15:34, 14 May 2018 by David (talk | contribs) (Created page with "== Exam notes == '''I''' Suppose that you have a credit scoring task, where you have to create a ML model that approximates expert evaluation of an individual's creditworthin...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Exam notes

I Suppose that you have a credit scoring task, where you have to create a ML model that approximates expert evaluation of an individual's creditworthiness. Which of the following can potentially be a data leakage? Select all that apply.

1. First half of the data points in the train set has a score of 0, while the second half has scores > 0. - This should be selected
Is a leak
2. Among the features you have a company_id, an identifier of a company where this person works. It turns out that this feature is very important and adding it to the model significantly improves your score. = Un-selected is correct
not a source of leak
3. An ID of a data point (row) in the train set correlates with target variable.
Explainiation: Data was not shuffled, this information can not be used in real-world scenario