class: middle, inverse .leftcol30[ <center> <img src="https://github.com/emse-madd-gwu/emse-madd-gwu.github.io/raw/master/images/madd_hex_sticker.png" width=250> </center> ] .rightcol70[ # Week 7: .fancy[Utility Models] ###
EMSE 6035: Marketing Analytics for Design Decisions ###
John Paul Helveston ###
October 13, 2021 ] --- class: inverse, middle # Week 7: .fancy[Utility Models] ### 1. Utility models ### 2. Exploring choice data ### 3. Linear & discrete parameters ### BREAK ### 4. Outside good ### 5. Team project utility models --- class: inverse, middle # Week 7: .fancy[Utility Models] ### 1. .orange[Utility models] ### 2. Exploring choice data ### 3. Linear & discrete parameters ### BREAK ### 4. Outside good ### 5. Team project utility models --- class: center # Random utility model <br> ## The utility for alternative `\(j\)` is # `$$\tilde{u}_j = v_j + \tilde{\varepsilon}_j$$` -- ## `\(v_j\)` = Things we observe (non-random variables) ## `\(\tilde{\varepsilon}_j\)` = Things we _don't_ observe (random variable) --- class: center # `$$\tilde{u}_j = v_j + \tilde{\varepsilon}_j$$` <center> <img src="images/utility.png" width=600> </center> --- class: inverse # Practice Question 1 a) A random variable, `\(\tilde{x}\)`, has the PDF, `\(f_{\tilde{x}}(x)\)`. Write the equation to compute its total probability (hint: think area under the curve!). What is the answer to the equation? -- b) A random variable, `\(\tilde{x}\)`, has a uniform distribution between the values 0 and 1. Draw the probability density function (PDF) and Cumulative Density Function (CDF) of `\(\tilde{x}\)`. -- c) The value of a random variable, `\(\tilde{x}\)`, is determined by rolling one fair, 6-sided dice. Draw the PDF and CDF of `\(\tilde{x}\)`. --- class: center ## **Logit model**: Assume that `\(\tilde{\varepsilon}_j\)` ~ [Gumbel Distribution](https://en.wikipedia.org/wiki/Gumbel_distribution) .leftcol[ ## `$$\tilde{u}_j = v_j + \tilde{\varepsilon}_j$$` <center> <img src="images/utility.png" width=450> </center> ] -- .rightcol[ ## Probability of choosing alternative `\(j\)`: # `$$P_j = \frac{e^{v_j}}{\sum_k{e^{v_k}}}$$` ] --- class: inverse # Practice Question 2 a) A consumer is making a choice between two bars of chocolate: - Milk chocolate `\((m)\)` - Dark chocolate `\((d)\)` Assume that we know the observed utility of each bar to be `\(v_m = 3\)` and `\(v_d = 4\)`. Using a logit model, compute the probabilities of choosing each bar: `\(P_m\)` and `\(P_d\)`. -- b) A third bar of chocolate is now added to the choice set. It is the exact same as the milk chocolate bar, but it has a slightly different wrapper (which has no effect on the consumer's utility). Now, `\(v_{m1} = v_{m2} = 3\)`, and `\(v_d = 4\)`. Based on the probabilities from question a), what would we expect the probabilities of choosing each bar to be? What probabilities does the logit model produce? --- class: center ### **"Observed utility" `\((v_j)\)` is a weighted sum of attribute values** <br> ## `$$v_j = \beta_1 x_{j}^{\mathrm{A}} + \beta_2 x_j^{\mathrm{B}} + \dots$$` ## Each `\(x_j\)` is an observable attribute (_price_, etc.) -- <br> ## We know `\(x_{j}^{\mathrm{A}}, x_{j}^{\mathrm{B}}, \dots\)`,<br>**we want to _estimate_** `\(\beta_1, \beta_2, \dots\)` --- #.center[Notation Convention] .leftcol[ ## Continuous: `\(x_j\)` ## `$$u_j = \beta_1 x_{j}^{\mathrm{price}} + \dots$$` ``` #> price #> 1 1 #> 2 2 #> 3 3 ``` ] .rightcol[ ## Discrete: `\(\delta_j\)` ## `$$u_j = \beta_1 \delta_{j}^{\mathrm{ford}} + \beta_2 \delta_{j}^{\mathrm{gm}} \dots$$` ``` #> brand brand_BMW brand_Ford brand_GM #> 1 Ford 0 1 0 #> 2 GM 0 0 1 #> 3 BMW 1 0 0 ``` ] --- class: inverse # Practice Question 3 <table> <thead> <tr> <th style="text-align:left;"> Attribute </th> <th style="text-align:left;"> Bar 1 </th> <th style="text-align:left;"> Bar 2 </th> <th style="text-align:left;"> Bar 3 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Price </td> <td style="text-align:left;"> $1.20 </td> <td style="text-align:left;"> $1.50 </td> <td style="text-align:left;"> $3.00 </td> </tr> <tr> <td style="text-align:left;"> % Cacao </td> <td style="text-align:left;"> 10% </td> <td style="text-align:left;"> 60% </td> <td style="text-align:left;"> 80% </td> </tr> </tbody> </table> a) Write out a model for the _observed_ utility of each chocolate bar in the above set. -- b) If the coefficient for the _price_ attribute was -0.1 and the coefficient for % _Cacao_ attribute was 0.1, what is the difference in the observed utility between bars 3 and 1? -- .leftcol[ c) With the addition of the _brand_ attribute, repeat part a. ] .rightcol[ <table class="table table-hover table-condensed" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Attribute </th> <th style="text-align:left;"> Bar 1 </th> <th style="text-align:left;"> Bar 2 </th> <th style="text-align:left;"> Bar 3 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Price </td> <td style="text-align:left;"> $1.20 </td> <td style="text-align:left;"> $1.50 </td> <td style="text-align:left;"> $3.00 </td> </tr> <tr> <td style="text-align:left;"> % Cacao </td> <td style="text-align:left;"> 10% </td> <td style="text-align:left;"> 60% </td> <td style="text-align:left;"> 80% </td> </tr> <tr> <td style="text-align:left;"> Brand </td> <td style="text-align:left;"> Hershey </td> <td style="text-align:left;"> Lindt </td> <td style="text-align:left;"> Ghirardelli </td> </tr> </tbody> </table> ] --- class: inverse
20
:
00
## Your Turn .leftcol[ Let's say our utility function is: .font80[$$v_j = \beta_1 x_j^{\mathrm{price}} + \beta_2 x_j^{\mathrm{cacao}} + \beta_3 \delta_j^{\mathrm{hershey}} + \beta_4 \delta_j^{\mathrm{lindt}}$$] And we estimate the following coefficients: Parameter | Coefficient ----------|----------- `\(\beta_1\)` | -0.1 `\(\beta_2\)` | 0.1 `\(\beta_3\)` | -2.0 `\(\beta_4\)` | -0.1 ] .rightcol[ a) What are the expected probabilities of choosing each of these bars using a logit model? <table class="table table-hover table-condensed" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Attribute </th> <th style="text-align:left;"> Bar 1 </th> <th style="text-align:left;"> Bar 2 </th> <th style="text-align:left;"> Bar 3 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Price </td> <td style="text-align:left;"> $1.20 </td> <td style="text-align:left;"> $1.50 </td> <td style="text-align:left;"> $3.00 </td> </tr> <tr> <td style="text-align:left;"> % Cacao </td> <td style="text-align:left;"> 10% </td> <td style="text-align:left;"> 60% </td> <td style="text-align:left;"> 80% </td> </tr> <tr> <td style="text-align:left;"> Brand </td> <td style="text-align:left;"> Hershey </td> <td style="text-align:left;"> Lindt </td> <td style="text-align:left;"> Ghirardelli </td> </tr> </tbody> </table> b) What price would Bar 2 have to be to get a 50% market share? ] --- class: inverse, middle # Week 7: .fancy[Utility Models] ### 1. Utility models ### 2. .orange[Exploring choice data] ### 3. Linear & discrete parameters ### BREAK ### 4. Outside good ### 5. Team project utility models --- class: center ## Download the [logitr-cars](https://github.com/emse-madd-gwu/logitr-cars) repo from GitHub <center> <img src="images/logitr-cars.png" width=900> </center> --- # .center[Exploring choice data] <br> .rightcol80[ ## 1. Open `logitr-cars.Rproj` ## 2. Open `code/2.1-explore-data.R` ] --- class: inverse, middle # Week 7: .fancy[Utility Models] ### 1. Utility models ### 2. Exploring choice data ### 3. .orange[Linear & discrete parameters] ### BREAK ### 4. Outside good ### 5. Team project utility models --- # .center[Dummy-coded variables] .center[**Dummy coding**: 1 = "Yes", 0 = "No"] -- .leftcol[ Data frame with one variable: _price_ ```r data <- data.frame(price = c(10, 20, 30)) data ``` ``` #> price #> 1 10 #> 2 20 #> 3 30 ``` ] -- .rightcol[ Add dummy columns for each price "level" ```r library(fastDummies) dummy_cols(data, "price") ``` ``` #> price price_10 price_20 price_30 #> 1 10 1 0 0 #> 2 20 0 1 0 #> 3 30 0 0 1 ``` ] --- .leftcol[ .center[ ### Model _price_ as _continuous_ `\(v_j = \beta_1 x^\mathrm{price}\)` ] ```r model <- logitr( data = data, choice = "choice", obsID = "obsID", pars = "price" ) ``` <br> Coef. | Interpretation ------|------------------ β1 | how utility changes with increasing _price_ ] -- .rightcol[ .center[ ### .center[Model _price_ as _discrete_] `\(v_j = \beta_1 \delta^\mathrm{price = 20} + \beta_2 \delta^\mathrm{price = 30}\)` ] ```r model <- logitr( data = data, choice = "choice", obsID = "obsID", pars = c("price_20", "price_30") ) ``` .center[Reference level: _price=10_] Coef. | Interpretation ------|------------------ β1 | utility for _price=20_ relative to _price=10_ β2 | utility for _price=30_ relative to _price=10_ ] --- # .center[Estimating utility models] <br> .rightcol80[ ## 1. Open `logitr-cars.Rproj` ## 2. Open `code/3.1-model-mnl.R` ] --- .leftcol[ # `mnl_dummy` All dummy-code variables ```r pars = c( "price_20", "price_25", "fuelEconomy_25", "fuelEconomy_30", "accelTime_7", "accelTime_8", "powertrain_Electric") ``` Reference Levels: - Price: 15 - Fuel Economy: 20 - Accel. Time: 6 - Powertrain: "Gasoline" ] -- .rightcol[ # `mnl_linear` All continuous (linear), except for `powertrain_Electric` ```r pars = c( 'price', 'fuelEconomy', 'accelTime', 'powertrain_Electric') ``` Reference Levels: - Powertrain: "Gasoline" ] --- class: inverse
20
:
00
## Your Turn 1) Run the code chunk to read in the `data.csv` file in the "data" folder, which contains choice observations from chocolate bars with the following attributes: .font80[ Attribute | Description ----------|---------------------- `price` | Price in $ `percent_cacao` | % Cacao (how "dark" the chocolate is) `crispy_rice` | 0 or 1 for if the bar contains crispy rice `brand` | "Hershey", "Lindt", or "Ghirardelli" ] 2) Write code to estimate the following utility model<br>(HINT: you may need to make some dummy-coded variables!): `$$u_j = \beta_1 x_j^{\mathrm{price}} + \beta_2 x_j^{\mathrm{\%cacao}} + \beta_3 \delta_j^{\mathrm{crispy}} + \beta_4 \delta_j^{\mathrm{hershey}} + \beta_5 \delta_j^{\mathrm{lindt}} + \varepsilon_j$$` 3) Write code to plot the change in utility for the _price_ attribute. --- class: inverse, middle # Week 7: .fancy[Utility Models] ### 1. Utility models ### 2. Exploring choice data ### 3. Linear & discrete parameters ### BREAK ### 4. .orange[Outside good] ### 5. Team project utility models --- ## .center[Estimating utility models with an _Outside Good_] <br> .rightcol80[ ## 1. Open `logitr-cars.Rproj` ## 2. Open `code/4.1-model-og.R` ] --- class: inverse, middle # Week 7: .fancy[Utility Models] ### 1. Utility models ### 2. Exploring choice data ### 3. Linear & discrete parameters ### BREAK ### 4. Outside good ### 5. .orange[Team project utility models] --- # .center[Simulating choice data] .leftcol40[ .center[Random choices] ```r data <- simulateChoices( survey, altID = "altID", obsID = "obsID" ) ``` ] -- .rightcol60[ .center[Choices according to assumed model .font80[ `\(v_j = -0.1 x_j^{\mathrm{price}} + 0.1 x_j^{\mathrm{fuelEconomy}} + 0.1 x_j^{\mathrm{accelTime}} -4 \delta_j^{\mathrm{electric}}\)` ]] ```r data <- simulateChoices( survey, altID = "altID", obsID = "obsID", pars = list( price = -0.1, fuelEconomy = 0.1, accelTime = 0.1, powertrain_Electric = -4 ) ) ``` ] --- # .center[Estimate a choice model] `$$v_j = \beta_1 x_j^{\mathrm{price}} + \beta_2 x_j^{\mathrm{fuelEconomy}} + \beta_3 x_j^{\mathrm{accelTime}} + \beta_4 \delta_j^{\mathrm{electric}}$$` ```r model <- logitr( data = data, choice = "choice", obsID = "obsID", pars = c("price", "fuelEconomy", "accelTime", "powertrain_Electric") ) ``` --- class: inverse
20
:
00
## Your Turn ### As a team: 1. Go back to your code from last week where you created your choice questions. 2. Write out a utility model for your project. 3. Write code to simulate data according to your utility model - pick some fake parameter values. 4. Write code to estimate a model using your simulated data.