B9BA102: Analyze A Dataset From A Problem Domain In Depth, And Select Appropriate: Applied Statistics & Machine Learning Assignment, DBS, Ireland

Learning Outcomes

1. Analyze a dataset from a problem domain in depth, and select appropriate statistical models, tools, and techniques to derive insights regarding the dataset and domain.

2. Effectively extract, transform, interrogate, and analyze large datasets.

3. Construct, refine, interpret, and critically evaluate predictive analytical and machine learning models.

4. Critically evaluate and utilize hyperparameter search strategies for optimizing machine learning models.


The bank wants to use a classification model that can predict customer churn. Construct a suitable classification model for the bank by implementing both random forest and support vector classification algorithms in Python.

In addition to providing the python code file, you are required to provide a critical analysis of your approach and results in a pdf report.

Your code and analysis should cover the following points:
1. Data Preparation (What steps would you take to prepare your data? Discuss your approach)

2. Model Hyperparameter Tuning (Which hyperparameters would you tune and why? How would you tune them?)

3. Choice of Evaluation Metric (Which metric would be suitable for model evaluation and why?)

4. Overfitting avoidance mechanism (Which mechanism (feature Selection/ regularization) would you use and why?)

5. Results analysis
a). Which of the two models (random forest or support vector classifier) would you recommend for deployment in the real-world?

b). Is any model underfitting? If yes, what could be the possible reasons?

No Comment.