Supervised Machine Learning: An Experiential and Applied Session Custom Case Solution & Analysis
Evidence Brief: Case Data Extraction
Financial Metrics
- Total Customer Base: 5,000 records.
- Historical Conversion Rate: 9.6 percent (480 customers accepted a personal loan in the previous campaign).
- Campaign Cost Structure: The case does not explicitly state the dollar cost per contact, but emphasizes the need to reduce the volume of unsuccessful solicitations.
- Revenue Drivers: Interest income from personal loans and fee-based services for liability customers.
Operational Facts
- Data Dimensions: 14 variables including Age, Experience, Income, ZIP Code, Family size, Average Credit Card Balance (CCAvg), Education level, Mortgage value, and existing account types.
- Target Variable: Personal Loan (Binary: 1 if accepted, 0 if not).
- Technical Process: Follows the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework.
- Modeling Requirements: Data partitioning into Training (60 percent) and Validation (40 percent) sets to prevent overfitting.
Stakeholder Positions
- Marketing Department: Seeks to increase the success rate of personal loan offers while minimizing the annoyance to the 90.4 percent of customers who are unlikely to respond.
- Data Science Team: Tasked with selecting and tuning the optimal classification algorithm (k-Nearest Neighbors, Logistic Regression, or Decision Trees).
- Retail Banking Leadership: Focused on growing the loan portfolio without increasing the risk profile of the bank.
Information Gaps
- Cost of False Positives: The specific financial penalty for contacting a customer who does not convert.
- Cost of False Negatives: The opportunity cost of missing a customer who would have converted.
- Data Recency: The time interval between the collection of the training data and the planned execution of the next campaign.
- Customer Lifetime Value (CLV): The long-term profitability of a converted loan customer versus a liability-only customer.
Strategic Analysis
Core Strategic Question
- How can Universal Bank transition from mass-market solicitation to a precision-targeted acquisition model to maximize personal loan conversion while minimizing marketing waste?
Structural Analysis
The problem is a classic classification challenge. Applying the CRISP-DM framework reveals that the primary bottleneck is not data volume, but the selection of the correct features to predict conversion. The 9.6 percent baseline conversion rate indicates a high degree of class imbalance, meaning a naive model could achieve 90.4 percent accuracy by simply predicting zero for everyone. The strategic focus must shift from overall accuracy to sensitivity (recall) and the lift over the baseline.
Strategic Options
- Option 1: K-Nearest Neighbors (k-NN) Classification. This non-parametric approach identifies similar customer profiles.
- Rationale: Effective at capturing non-linear relationships between income, education, and loan acceptance.
- Trade-offs: Requires significant computational power as the dataset grows and is sensitive to the choice of k.
- Resource Requirements: Data normalization and feature scaling.
- Option 2: Logistic Regression. A parametric model to estimate the probability of loan acceptance.
- Rationale: Provides clear coefficients that allow management to understand which factors (e.g., Income, CD Account) drive conversion.
- Trade-offs: Assumes linear relationships between features and the log-odds of the outcome.
- Resource Requirements: Statistical validation of assumptions and variable significance testing.
- Option 3: Status Quo (Broad Segment Targeting). Continuing to target customers based on simple demographic filters like Income > $100k.
- Rationale: Low technical complexity and no requirement for advanced modeling.
- Trade-offs: High waste and missed opportunities among lower-income but high-propensity segments.
- Resource Requirements: None beyond existing marketing staff.
Preliminary Recommendation
Universal Bank should implement the k-NN model with a k-value optimized via validation error rates. While Logistic Regression offers interpretability, the primary goal is predictive performance to maximize the conversion of the next 5,000 prospects. The non-linear nature of banking behavior—where the interaction between income and family size often dictates credit needs—makes k-NN the superior choice for maximizing the lift in the top two deciles of the prospect list.
Implementation Roadmap
Critical Path
- Phase 1: Data Preparation (Weeks 1-3). Normalize all continuous variables (Age, Income, CCAvg). Convert categorical variables (Education) into dummy variables. This is the prerequisite for any distance-based algorithm.
- Phase 2: Model Training and Selection (Weeks 4-6). Run k-NN, Logistic Regression, and Classification Trees on the 60 percent training set. Evaluate performance on the 40 percent validation set using a confusion matrix.
- Phase 3: Optimization (Weeks 7-8). Focus on the Decile Lift Chart. The goal is to ensure the top 10 percent of predicted customers contain at least 50 percent of the actual converters.
- Phase 4: Pilot Deployment (Weeks 9-12). Execute a live marketing campaign on a subset of 1,000 customers using the model predictions. Compare results against a control group.
Key Constraints
- Data Quality: The ZIP Code variable contains 5,000 entries; if these are not grouped into regions or discarded, the model will suffer from the curse of dimensionality.
- Class Imbalance: With only 480 positive cases, the model may struggle to learn the characteristics of the minority class without oversampling or adjusting the cutoff probability.
Risk-Adjusted Implementation Strategy
To mitigate the risk of model decay, the bank must implement a feedback loop where the results of the pilot campaign are fed back into the training set. If the k-NN model shows high variance, the team should pivot to an Ensemble method (Random Forest) in the second iteration to improve stability. The implementation assumes a 0.5 probability cutoff, but this must be adjusted based on the actual costs of marketing versus the revenue of a loan.
Executive Review and BLUF
BLUF
Universal Bank must replace its current marketing approach with a k-Nearest Neighbors (k-NN) predictive model. The current 9.6 percent conversion rate is insufficient and results in excessive marketing spend on non-responsive customers. By partitioning data and applying supervised learning, the bank can identify the specific customer profiles—largely driven by the intersection of high income, professional education, and existing CD accounts—that are most likely to convert. Implementation should focus on the top two deciles of predicted probability, which historically contain the majority of successful conversions. This transition will increase campaign efficiency and portfolio growth without expanding the underlying risk. APPROVED FOR LEADERSHIP REVIEW.
Dangerous Assumption
The single most consequential premise is that historical data from the previous campaign remains a valid proxy for future behavior. If macroeconomic conditions (e.g., interest rate hikes) have changed since the data was collected, the drivers of loan acceptance will shift, rendering the model obsolete before deployment.
Unaddressed Risks
- Algorithmic Bias: The use of ZIP Code data may inadvertently lead to redlining or discriminatory lending patterns, creating significant regulatory and reputational risk. Probability: Medium. Consequence: High.
- Overfitting: The model may perform exceptionally well on the validation set but fail in the real world due to the inclusion of noise variables like ID or Experience, which correlate too closely with Age. Probability: High. Consequence: Medium.
Unconsidered Alternative
The analysis focuses on predicting who will accept a loan, but it ignores the probability of default. A more effective strategy would be a two-stage model: first predicting acceptance propensity, then filtering those prospects through a credit-risk model. This ensures the bank is not successfully marketing to high-risk individuals who are likely to accept any credit offer because they cannot obtain it elsewhere.
MECE Analysis of Customer Segments
| Segment |
Characteristics |
Strategic Action |
| High Propensity / High Credit |
High Income, CD Account, Low Mortgage |
Primary target for direct solicitation. |
| High Propensity / Low Credit |
High CCAvg, Multi-family, No Securities |
Exclude to prevent portfolio risk. |
| Low Propensity / High Credit |
Low Income, Online Only, High Education |
Retain as liability customers; do not solicit. |
| Low Propensity / Low Credit |
Low Income, No existing accounts |
No action; minimize contact costs. |
State Farm: Climate Change, Homeowners Insurance and Being a Good Neighbor custom case study solution
Astral Limited: Crafting Trading Strategies Through Technical Analysis custom case study solution
Nexent Systems: The Case of the Product Roadmap Blues custom case study solution
Boutiqaat: Influencing Retail in MENA custom case study solution
Noodle Analytics in 2024: Exploring the Frontiers of AI custom case study solution
Corporate Transformation at Merck KGaA, Darmstadt, Germany custom case study solution
Siemens Healthineers: A Digital Journey custom case study solution
Essential Coffee Group Australia: Valuation of a Potential Acquisition custom case study solution
Allegis India - Enabling & Promoting Disability Inclusion custom case study solution
Sapphire Textile Mills Limited: Refined Costing custom case study solution
Rosewood Hotels and Resorts: Branding to Increase Customer Profitability and Lifetime Value custom case study solution
The JetBlue Story custom case study solution
Netflix: The Customer Strikes Back custom case study solution
Splash Corporation (A): Competing With the Big Brands custom case study solution
Powerven: When It Is Imperative to Change custom case study solution