Lean Data Scientist

Your One Stop Shop for Data Science Interviews

House Flipping Dataset

Instructions

The following case is designed to showcase your strategic problem solving and data analysis skills. The problem statements are intentionally open-ended. It is possible to spend as much or as little time on this exercise as you would like. We recommend you take no more than ~4 hours to complete it. You should be able to analyze the data set below with excel or any data/statistic analysis tool like R or Python. It would be ideal if you could provide your final response in google slide format. Please also send over the backup to any analysis (e.g. SQL queries, sheets, excel, etc.).

Question 1

We’ve given you disguised data for the MLS (market) and our house flipping team. The spreadsheet includes counts of active listings, resale contracts, and related data. How enthusiastic or worried are you about our resale performance in 2017?

Question 2

Based on what you learned in Question 1, what do you hypothesize is driving any over/underperformance trends? Please limit to your top three hypotheses. For each hypothesis, what additional data would you request and what analyses would you run to validate?

Question 3

Chose one of your top hypotheses from Question 2 and assume you’ve validated that it’s the driver of the performance trend. Imagine that you are the pricing modeling team, propose a product or model solution to correct (in the instance of underperformance) or drive continued growth (if you’ve identified positive performance).

Field Definitions

  • price_band: original list price bucketed below and above $200K
  • zip_code: A, B, C, and D denote 4 different, illustrative zips
  • mls_listings: total active listings on the market on any given day
  • mls_contracts: resales contracts
  • od_listings: total Our active listings on the market on any given day
  • od_contracts: Our resales contracts
  • od_home_visits: total home visits on all our active listings

The data you’ll get: