Utility Models

.leftcol30[
<center>
<img src="https://github.com/emse-madd-gwu/emse-madd-gwu.github.io/raw/master/images/madd_hex_sticker.png" width=250>
</center>
]

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M496 128v16a8 8 0 0 1-8 8h-24v12c0 6.627-5.373 12-12 12H60c-6.627 0-12-5.373-12-12v-12H24a8 8 0 0 1-8-8v-16a8 8 0 0 1 4.941-7.392l232-88a7.996 7.996 0 0 1 6.118 0l232 88A8 8 0 0 1 496 128zm-24 304H40c-13.255 0-24 10.745-24 24v16a8 8 0 0 0 8 8h464a8 8 0 0 0 8-8v-16c0-13.255-10.745-24-24-24zM96 192v192H60c-6.627 0-12 5.373-12 12v20h416v-20c0-6.627-5.373-12-12-12h-36V192h-64v192h-64V192h-64v192h-64V192H96z"/></svg> EMSE 6035: Marketing Analytics for Design Decisions
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M224 256c70.7 0 128-57.3 128-128S294.7 0 224 0 96 57.3 96 128s57.3 128 128 128zm89.6 32h-16.7c-22.2 10.2-46.9 16-72.9 16s-50.6-5.8-72.9-16h-16.7C60.2 288 0 348.2 0 422.4V464c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48v-41.6c0-74.2-60.2-134.4-134.4-134.4z"/></svg> John Paul Helveston
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M0 464c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48V192H0v272zm320-196c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12h-40c-6.6 0-12-5.4-12-12v-40zm0 128c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12h-40c-6.6 0-12-5.4-12-12v-40zM192 268c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12h-40c-6.6 0-12-5.4-12-12v-40zm0 128c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12h-40c-6.6 0-12-5.4-12-12v-40zM64 268c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12H76c-6.6 0-12-5.4-12-12v-40zm0 128c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12H76c-6.6 0-12-5.4-12-12v-40zM400 64h-48V16c0-8.8-7.2-16-16-16h-32c-8.8 0-16 7.2-16 16v48H160V16c0-8.8-7.2-16-16-16h-32c-8.8 0-16 7.2-16 16v48H48C21.5 64 0 85.5 0 112v48h448v-48c0-26.5-21.5-48-48-48z"/></svg> October 13, 2021
]

---

# Week 7: .fancy[Utility Models]

### 1. Utility models
### 2. Exploring choice data
### 3. Linear & discrete parameters

### BREAK

### 4. Outside good
### 5. Team project utility models

---

# Week 7: .fancy[Utility Models]

### 1. .orange[Utility models]
### 2. Exploring choice data
### 3. Linear & discrete parameters

### BREAK

### 4. Outside good
### 5. Team project utility models

---

# Random utility model

<br>

## The utility for alternative `$j$` is
# `$$\tilde{u}_j = v_j + \tilde{\varepsilon}_j$$`

## `$v_j$` = Things we observe (non-random variables)
## `$\tilde{\varepsilon}_j$` = Things we _don't_ observe (random variable)

---

# `$$\tilde{u}_j = v_j + \tilde{\varepsilon}_j$$`

---

# Practice Question 1

a) A random variable, `$\tilde{x}$`, has the PDF, `$f_{\tilde{x}}(x)$`. Write the equation to compute its total probability (hint: think area under the curve!). What is the answer to the equation?

b) A random variable, `$\tilde{x}$`, has a uniform distribution between the values 0 and 1. Draw the probability density function (PDF) and Cumulative Density Function (CDF) of `$\tilde{x}$`.

c) The value of a random variable, `$\tilde{x}$`, is determined by rolling one fair, 6-sided dice. Draw the PDF and CDF of `$\tilde{x}$`.

---

## **Logit model**: Assume that `$\tilde{\varepsilon}_j$` ~ [Gumbel Distribution](https://en.wikipedia.org/wiki/Gumbel_distribution)

<center>
<img src="images/utility.png" width=450>
</center>
]

# `$$P_j = \frac{e^{v_j}}{\sum_k{e^{v_k}}}$$`
]

---

# Practice Question 2

a) A consumer is making a choice between two bars of chocolate:

- Milk chocolate `$(m)$`
- Dark chocolate `$(d)$`

Assume that we know the observed utility of each bar to be `$v_m = 3$` and `$v_d = 4$`. Using a logit model, compute the probabilities of choosing each bar: `$P_m$` and `$P_d$`.

b) A third bar of chocolate is now added to the choice set. It is the exact same as the milk chocolate bar, but it has a slightly different wrapper (which has no effect on the consumer's utility). Now,  `$v_{m1} = v_{m2} = 3$`, and `$v_d = 4$`. Based on the probabilities from question a), what would we expect the probabilities of choosing each bar to be? What probabilities does the logit model produce?

---

### **"Observed utility" `$(v_j)$` is a weighted sum of attribute values**

<br>

## `$$v_j = \beta_1 x_{j}^{\mathrm{A}} + \beta_2 x_j^{\mathrm{B}} +  \dots$$`

## Each `$x_j$` is an observable attribute (_price_, etc.)

<br>

## We know `$x_{j}^{\mathrm{A}}, x_{j}^{\mathrm{B}}, \dots$`,<br>**we want to _estimate_** `$\beta_1, \beta_2, \dots$`

---

#.center[Notation Convention]

## Continuous: `$x_j$`

## `$$u_j = \beta_1 x_{j}^{\mathrm{price}} + \dots$$`

```
#>   price
#> 1     1
#> 2     2
#> 3     3
```
]

## Discrete: `$\delta_j$`

## `$$u_j = \beta_1 \delta_{j}^{\mathrm{ford}} + \beta_2 \delta_{j}^{\mathrm{gm}} \dots$$`

```
#>   brand brand_BMW brand_Ford brand_GM
#> 1  Ford         0          1        0
#> 2    GM         0          0        1
#> 3   BMW         1          0        0
```
]

---

# Practice Question 3

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Attribute </th>
   <th style="text-align:left;"> Bar 1 </th>
   <th style="text-align:left;"> Bar 2 </th>
   <th style="text-align:left;"> Bar 3 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Price </td>
   <td style="text-align:left;"> $1.20 </td>
   <td style="text-align:left;"> $1.50 </td>
   <td style="text-align:left;"> $3.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> % Cacao </td>
   <td style="text-align:left;"> 10% </td>
   <td style="text-align:left;"> 60% </td>
   <td style="text-align:left;"> 80% </td>
  </tr>
</tbody>
</table>

a) Write out a model for the _observed_ utility of each chocolate bar in the above set.

b) If the coefficient for the _price_ attribute was -0.1 and the coefficient for % _Cacao_ attribute was 0.1, what is the difference in the observed utility between bars 3 and 1?

.rightcol[
<table class="table table-hover table-condensed" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Attribute </th>
   <th style="text-align:left;"> Bar 1 </th>
   <th style="text-align:left;"> Bar 2 </th>
   <th style="text-align:left;"> Bar 3 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Price </td>
   <td style="text-align:left;"> $1.20 </td>
   <td style="text-align:left;"> $1.50 </td>
   <td style="text-align:left;"> $3.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> % Cacao </td>
   <td style="text-align:left;"> 10% </td>
   <td style="text-align:left;"> 60% </td>
   <td style="text-align:left;"> 80% </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Brand </td>
   <td style="text-align:left;"> Hershey </td>
   <td style="text-align:left;"> Lindt </td>
   <td style="text-align:left;"> Ghirardelli </td>
  </tr>
</tbody>
</table>
]

---

## Your Turn

Let's say our utility function is:

.font80[$$v_j = \beta_1 x_j^{\mathrm{price}} + \beta_2 x_j^{\mathrm{cacao}} + \beta_3 \delta_j^{\mathrm{hershey}} + \beta_4 \delta_j^{\mathrm{lindt}}$$]

And we estimate the following coefficients:

Parameter | Coefficient 
----------|-----------
`$\beta_1$` | -0.1
`$\beta_2$` | 0.1
`$\beta_3$` | -2.0
`$\beta_4$` | -0.1
]

.rightcol[
a) What are the expected probabilities of choosing each of these bars using a logit model?

<table class="table table-hover table-condensed" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Attribute </th>
   <th style="text-align:left;"> Bar 1 </th>
   <th style="text-align:left;"> Bar 2 </th>
   <th style="text-align:left;"> Bar 3 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Price </td>
   <td style="text-align:left;"> $1.20 </td>
   <td style="text-align:left;"> $1.50 </td>
   <td style="text-align:left;"> $3.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> % Cacao </td>
   <td style="text-align:left;"> 10% </td>
   <td style="text-align:left;"> 60% </td>
   <td style="text-align:left;"> 80% </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Brand </td>
   <td style="text-align:left;"> Hershey </td>
   <td style="text-align:left;"> Lindt </td>
   <td style="text-align:left;"> Ghirardelli </td>
  </tr>
</tbody>
</table>

b) What price would Bar 2 have to be to get a 50% market share?
]

---

# Week 7: .fancy[Utility Models]

### 1. Utility models
### 2. .orange[Exploring choice data]
### 3. Linear & discrete parameters

### BREAK

### 4. Outside good
### 5. Team project utility models

---

## Download the [logitr-cars](https://github.com/emse-madd-gwu/logitr-cars) repo from GitHub

---

# .center[Exploring choice data]

<br>

## 2. Open `code/2.1-explore-data.R`
]

---

# Week 7: .fancy[Utility Models]

### 1. Utility models
### 2. Exploring choice data
### 3. .orange[Linear & discrete parameters]

### BREAK

### 4. Outside good
### 5. Team project utility models

---

# .center[Dummy-coded variables]

```r
data <- data.frame(price = c(10, 20, 30))

data
```

```
#>   price
#> 1    10
#> 2    20
#> 3    30
```
]

```r
library(fastDummies)

dummy_cols(data, "price")
```

```
#>   price price_10 price_20 price_30
#> 1    10        1        0        0
#> 2    20        0        1        0
#> 3    30        0        0        1
```
]

---

`$v_j = \beta_1 x^\mathrm{price}$`

]

```r
model <- logitr(
    data   = data,
    choice = "choice",
    obsID  = "obsID",
    pars   = "price"
)
```

<br>

Coef. | Interpretation
------|------------------
β1 | how utility changes with increasing _price_

]

`$v_j = \beta_1 \delta^\mathrm{price = 20} +  \beta_2 \delta^\mathrm{price = 30}$`
]

```r
model <- logitr(
    data   = data,
    choice = "choice",
    obsID  = "obsID",
    pars   = c("price_20", "price_30")
)
```

Coef. | Interpretation
------|------------------
β1 | utility for _price=20_ relative to _price=10_
β2 | utility for _price=30_ relative to _price=10_

]

---

# .center[Estimating utility models]

<br>

## 2. Open `code/3.1-model-mnl.R`
]

---

# `mnl_dummy`

All dummy-code variables

```r
pars = c(
  "price_20", "price_25",
  "fuelEconomy_25", "fuelEconomy_30",
  "accelTime_7", "accelTime_8",
  "powertrain_Electric")
```

Reference Levels:

- Price: 15
- Fuel Economy: 20
- Accel. Time: 6
- Powertrain: "Gasoline"
]

All continuous (linear), except for `powertrain_Electric`

```r
pars = c(
  'price', 'fuelEconomy', 'accelTime', 
  'powertrain_Electric')
```

Reference Levels:

- Powertrain: "Gasoline"
]

---

## Your Turn

1) Run the code chunk to read in the `data.csv` file in the "data" folder, which contains choice observations from chocolate bars with the following attributes:

.font80[
Attribute | Description 
----------|----------------------
`price` | Price in $
`percent_cacao` | % Cacao (how "dark" the chocolate is)
`crispy_rice` | 0 or 1 for if the bar contains crispy rice
`brand` | "Hershey", "Lindt", or "Ghirardelli"
]

2) Write code to estimate the following utility model<br>(HINT: you may need to make some dummy-coded variables!):

`$$u_j = \beta_1 x_j^{\mathrm{price}} + \beta_2 x_j^{\mathrm{\%cacao}} + \beta_3 \delta_j^{\mathrm{crispy}} + \beta_4 \delta_j^{\mathrm{hershey}} + \beta_5 \delta_j^{\mathrm{lindt}} + \varepsilon_j$$`

3) Write code to plot the change in utility for the _price_ attribute.

---

# Week 7: .fancy[Utility Models]

### 1. Utility models
### 2. Exploring choice data
### 3. Linear & discrete parameters

### BREAK

### 4. .orange[Outside good]
### 5. Team project utility models

---

## .center[Estimating utility models with an _Outside Good_]

<br>

## 2. Open `code/4.1-model-og.R`
]

---

# Week 7: .fancy[Utility Models]

### 1. Utility models
### 2. Exploring choice data
### 3. Linear & discrete parameters

### BREAK

### 4. Outside good
### 5. .orange[Team project utility models]

---

# .center[Simulating choice data]

```r
data <- simulateChoices(
  survey, 
  altID = "altID",
  obsID = "obsID"
)
```

]

.font80[
`$v_j = -0.1 x_j^{\mathrm{price}} + 0.1 x_j^{\mathrm{fuelEconomy}} + 0.1 x_j^{\mathrm{accelTime}} -4 \delta_j^{\mathrm{electric}}$`
]]

```r
data <- simulateChoices(
  survey, 
  altID = "altID",
  obsID = "obsID",
  pars = list(
    price = -0.1,
    fuelEconomy = 0.1, 
    accelTime = 0.1,
    powertrain_Electric = -4
  )
)
```
]

---

# .center[Estimate a choice model]

`$$v_j = \beta_1 x_j^{\mathrm{price}} + \beta_2 x_j^{\mathrm{fuelEconomy}} + \beta_3 x_j^{\mathrm{accelTime}} + \beta_4 \delta_j^{\mathrm{electric}}$$`

```r
model <- logitr(
  data   = data,
  choice = "choice",
  obsID  = "obsID",
  pars   = c("price", "fuelEconomy", "accelTime", "powertrain_Electric")
)
```

---

## Your Turn

### As a team:

1. Go back to your code from last week where you created your choice questions.

2. Write out a utility model for your project.

3. Write code to simulate data according to your utility model - pick some fake parameter values.

4. Write code to estimate a model using your simulated data.