Lean Data Scientist

Your One Stop Shop for Data Science Interviews

Grocery Shopping Dataset

Welcome! We are a grocery delivery company that runs on data – and our awesome shoppers. One of the important ways we make customers happy is by delivering their groceries on-time. To do this, we start by asking “How long will a shopping trip take?” Let’s find out.

Your goal is to predict the shopping time (the difference between shopping_started_at and shopping_ended_at in seconds) for trips in the test set. The shopping time only includes the time it takes for a shopper to pick the items in the store. It does not include the driving and delivering parts.

  1. Perform any data cleaning, exploratory analysis and visualizations you may need to understand the data.
  2. Construct a predictive model and discuss why you chose your approach. 
  3. Assess performance of your model, alternatives you consider or concerns you may have.
  4. Generate a CSV file containing the predictions of the test trips in the following format:
trip_id,shopping_time

130622,900

130625,456

...

note: shopping_time has to be in seconds.

We want you to have the greatest chance of succeeding in this challenge, so please do the following:

  • Include your code and make sure it is well organized and clearly commented
  • Use either Python or R (these are the primary tools we use) and any open source libraries you’d like – submissions in any other language will not be reviewed
  • Make sure your output is formatted in the exact format specified as above
  • Include a written and / or visual summary of your work (such as R Markdown, Jupyter notebook or even just a text file or google doc) in addition to your code

The data you’ll get: