Uncertainty

]

# Week 9: .fancy[Uncertainty]

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M243.4 2.587C251.4-.8625 260.6-.8625 268.6 2.587L492.6 98.59C506.6 104.6 514.4 119.6 511.3 134.4C508.3 149.3 495.2 159.1 479.1 160V168C479.1 181.3 469.3 192 455.1 192H55.1C42.74 192 31.1 181.3 31.1 168V160C16.81 159.1 3.708 149.3 .6528 134.4C-2.402 119.6 5.429 104.6 19.39 98.59L243.4 2.587zM256 128C273.7 128 288 113.7 288 96C288 78.33 273.7 64 256 64C238.3 64 224 78.33 224 96C224 113.7 238.3 128 256 128zM127.1 416H167.1V224H231.1V416H280V224H344V416H384V224H448V420.3C448.6 420.6 449.2 420.1 449.8 421.4L497.8 453.4C509.5 461.2 514.7 475.8 510.6 489.3C506.5 502.8 494.1 512 480 512H31.1C17.9 512 5.458 502.8 1.372 489.3C-2.715 475.8 2.515 461.2 14.25 453.4L62.25 421.4C62.82 420.1 63.41 420.6 63.1 420.3V224H127.1V416z"/></svg> EMSE 6035: Marketing Analytics for Design Decisions
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M224 256c70.7 0 128-57.31 128-128s-57.3-128-128-128C153.3 0 96 57.31 96 128S153.3 256 224 256zM274.7 304H173.3C77.61 304 0 381.6 0 477.3c0 19.14 15.52 34.67 34.66 34.67h378.7C432.5 512 448 496.5 448 477.3C448 381.6 370.4 304 274.7 304z"/></svg> John Paul Helveston
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M96 32C96 14.33 110.3 0 128 0C145.7 0 160 14.33 160 32V64H288V32C288 14.33 302.3 0 320 0C337.7 0 352 14.33 352 32V64H400C426.5 64 448 85.49 448 112V160H0V112C0 85.49 21.49 64 48 64H96V32zM448 464C448 490.5 426.5 512 400 512H48C21.49 512 0 490.5 0 464V192H448V464z"/></svg> October 26, 2022

]

---

# Pilot Analysis Report

### Assignment is now [posted](https://madd.seas.gwu.edu/2022-Fall/project/4-pilot-analysis.html)

### Due 11/06 (that's 10 days from now)

---

# Week 9: .fancy[Uncertainty]

### 1. Computing uncertainty
### 2. Reshaping data

### BREAK

### 3. Cleaning pilot data
### 4. Estimating pilot data models

---

# Week 9: .fancy[Uncertainty]

### 1. .orange[Computing uncertainty]
### 2. Reshaping data

### BREAK

### 3. Cleaning pilot data
### 4. Estimating pilot data models

---

background-color: #EEEDEE
class: center

# Maximum likelihood estimation

---

background-color: #EEEDEE

---

background-color: #EEEDEE

### The _curvature_ of the log-likelihood function is related to the hessian

---

background-color: #EEEDEE

### The _curvature_ of the log-likelihood function is related to the hessian

---

background-color: #EEEDEE

### Usually report parameter uncertainty ("standard errors") with `$\sigma$` values

---

# Practice Question 1

Suppose we estimate a model and get the following results:

$$
\hat{\beta} =
`\begin{bmatrix}
-0.4
\\ 
0.5
\end{bmatrix}`
$$

$$
\nabla_{\beta}^2 \ln(\mathcal{L}) =
`\begin{bmatrix}
-6000 & 60
\\ 
60 & -700
\end{bmatrix}`
$$

a) Use the hessian to compute the standard errors for `$\hat{\beta}$`

b) Use the standard errors to compute a 95% confidence interval around `$\hat{\beta}$`

---

# .center[Simulating uncertainty]

We can use the coefficients and hessian from a model to obtain draws that reflect parameter uncertainty

```r
beta <- c(-0.7, 0.1, -4.0)

hessian <- matrix(c(
    -6000,   50,   60,
       50, -700,   50,
       60,   50, -300),
    ncol = 3, byrow = TRUE)
```

]

```r
covariance <- -1*solve(hessian)
draws <- MASS::mvrnorm(10^5, beta, covariance)

head(draws)
```

```
#>            [,1]       [,2]      [,3]
#> [1,] -0.7083524 0.04549746 -3.982507
#> [2,] -0.6753968 0.11482772 -4.062954
#> [3,] -0.7102888 0.02038767 -3.973829
#> [4,] -0.6838151 0.11916572 -4.075903
#> [5,] -0.6968972 0.05953439 -3.890033
#> [6,] -0.7303328 0.15493535 -3.918313
```

]

---

# .center[Simulating uncertainty]

We can use the coefficients and hessian from a model to obtain draws that reflect parameter uncertainty

```r
hist(draws[, 1])
```

]

```r
hist(draws[, 2])
```

]

```r
hist(draws[, 3])
```

]

---

# Practice Question 2

Suppose we estimate the following utility model describing preferences for cars:

$$
u_j = \alpha p_j + \beta_1 x_j^{mpg} + \beta_2 x_j^{elec} + \varepsilon_j
$$

a) Generate 10,000 draws of the model coefficients using the estimated coefficients and hessian. Use the `mvrnorm()` function from the `MASS` library.

b) Use the draws to compute the mean and 95% confidence intervals of each parameter estimate.

]

The estimated model produces the following results:

Parameter | Coefficient
----------|------------
`$\alpha$` | -0.7
`$\beta_1$` | 0.1
`$\beta_2$` | -0.4

Hessian:

$$
`\begin{bmatrix}
-6000 & 50 & 60
\\ 
50 & -700 & 50
\\
60 & 50 & -300
\end{bmatrix}`
$$

]

---

## Download the [logitr-cars](https://github.com/emse-madd-gwu/logitr-cars) repo from GitHub

---

# .center[Computing and visualizing uncertainty]

<br>

## 1. Open `logitr-cars`

## 2. Open `code/5.1-uncertainty.R`

]

---

# Week 9: .fancy[Uncertainty]

### 1. Computing uncertainty
### 2. .orange[Reshaping data]

### BREAK

### 3. Cleaning pilot data
### 4. Estimating pilot data models

---

## .center[Federal R&D Spending by Department]

```
#> # A tibble: 6 × 15
#>    year   DHS   DOC   DOD   DOE   DOT   EPA   HHS Interior  NASA   NIH   NSF Other  USDA    VA
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  1976     0   819 35696 10882  1142   968  9226     1152 12513  8025  2372  1191  1837   404
#> 2  1977     0   837 37967 13741  1095   966  9507     1082 12553  8214  2395  1280  1796   374
#> 3  1978     0   871 37022 15663  1156  1175 10533     1125 12516  8802  2446  1237  1962   356
#> 4  1979     0   952 37174 15612  1004  1102 10127     1176 13079  9243  2404  2321  2054   353
#> 5  1980     0   945 37005 15226  1048   903 10045     1082 13837  9093  2407  2468  1887   359
#> 6  1981     0   829 41737 14798   978   901  9644      990 13276  8580  2300  1925  1964   382
```

---

## .center[Federal R&D Spending by Department]

# "Wide" format

]]

# "Long" format

```
#> # A tibble: 6 × 3
#>   department  year rd_budget_mil
#>   <chr>      <dbl>         <dbl>
#> 1 DOD         1976         35696
#> 2 NASA        1976         12513
#> 3 DOE         1976         10882
#> 4 HHS         1976          9226
#> 5 NIH         1976          8025
#> 6 NSF         1976          2372
```

]]

---

## .center[Federal R&D Spending by Department]

# "Wide" format

```
#> [1] 42 15
```

]]

# "Long" format

```
#> [1] 588   3
```

]]

---

# .center[Tidy data = "Long" format]

- Each **variable** has its own **column**
- Each **observation** has its own **row**

---

# Tidy data

- Each **variable** has its own **column**
- Each **observation** has its own **row**

]

]

---

# "Long" format

]]

# "Wide" format

]]

---

# .center[**Do the names describe the values?**]

## **Yes**: "Long" format

]]

## **No**: "Wide" format

```
#> # A tibble: 6 × 8
#>    year   DHS   DOC   DOD   DOE   DOT   EPA   HHS
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  1976     0   819 35696 10882  1142   968  9226
#> 2  1977     0   837 37967 13741  1095   966  9507
#> 3  1978     0   871 37022 15663  1156  1175 10533
#> 4  1979     0   952 37174 15612  1004  1102 10127
#> 5  1980     0   945 37005 15226  1048   903 10045
#> 6  1981     0   829 41737 14798   978   901  9644
```

]]

---

# **Quick practice 1**: "long" or "wide" format?

**Description**: Tuberculosis cases in various countries

```
#> # A tibble: 6 × 4
#>   country      year  cases population
#>   <chr>       <dbl>  <dbl>      <dbl>
#> 1 Afghanistan  1999    745   19987071
#> 2 Afghanistan  2000   2666   20595360
#> 3 Brazil       1999  37737  172006362
#> 4 Brazil       2000  80488  174504898
#> 5 China        1999 212258 1272915272
#> 6 China        2000 213766 1280428583
```
]

---

# **Quick practice 2**: "long" or "wide" format?

**Description**: Word counts in LOTR trilogy

```
#> # A tibble: 9 × 4
#>   Film                       Race   Female  Male
#>   <chr>                      <chr>   <dbl> <dbl>
#> 1 The Fellowship Of The Ring Elf      1229   971
#> 2 The Fellowship Of The Ring Hobbit     14  3644
#> 3 The Fellowship Of The Ring Man         0  1995
#> 4 The Return Of The King     Elf       183   510
#> 5 The Return Of The King     Hobbit      2  2673
#> 6 The Return Of The King     Man       268  2459
#> 7 The Two Towers             Elf       331   513
#> 8 The Two Towers             Hobbit      0  2463
#> 9 The Two Towers             Man       401  3589
```
]

---

# **Quick practice 3**: "long" or "wide" format?

**Description**: Word counts in LOTR trilogy

```
#> # A tibble: 15 × 4
#>    Film                       Race   Gender Word_Count
#>    <chr>                      <chr>  <chr>       <dbl>
#>  1 The Fellowship Of The Ring Elf    Female       1229
#>  2 The Fellowship Of The Ring Elf    Male          971
#>  3 The Fellowship Of The Ring Hobbit Female         14
#>  4 The Fellowship Of The Ring Hobbit Male         3644
#>  5 The Fellowship Of The Ring Man    Female          0
#>  6 The Fellowship Of The Ring Man    Male         1995
#>  7 The Return Of The King     Elf    Female        183
#>  8 The Return Of The King     Elf    Male          510
#>  9 The Return Of The King     Hobbit Female          2
#> 10 The Return Of The King     Hobbit Male         2673
#> 11 The Return Of The King     Man    Female        268
#> 12 The Return Of The King     Man    Male         2459
#> 13 The Two Towers             Elf    Female        331
#> 14 The Two Towers             Elf    Male          513
#> 15 The Two Towers             Hobbit Female          0
```

---

# Reshaping data with

## `pivot_longer()` and `pivot_wider()`

---

background-color: #fff

# Reshaping data

## `pivot_longer()`<br>`pivot_wider()`

]

]

---

## .center[From "long" to "wide" with `pivot_wider()`]

---

## .center[From "long" to "wide" with `pivot_wider()`]

```r
head(fed_spend_long)
```

]

```r
fed_spend_wide <- fed_spend_long %>%
    pivot_wider(
*       names_from = department,
*       values_from = rd_budget_mil)

head(fed_spend_wide)
```

```
#> # A tibble: 6 × 15
#>    year   DOD  NASA   DOE   HHS   NIH   NSF  USDA Interior   DOT   EPA   DOC   DHS    VA Other
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  1976 35696 12513 10882  9226  8025  2372  1837     1152  1142   968   819     0   404  1191
#> 2  1977 37967 12553 13741  9507  8214  2395  1796     1082  1095   966   837     0   374  1280
#> 3  1978 37022 12516 15663 10533  8802  2446  1962     1125  1156  1175   871     0   356  1237
#> 4  1979 37174 13079 15612 10127  9243  2404  2054     1176  1004  1102   952     0   353  2321
#> 5  1980 37005 13837 15226 10045  9093  2407  1887     1082  1048   903   945     0   359  2468
#> 6  1981 41737 13276 14798  9644  8580  2300  1964      990   978   901   829     0   382  1925
```

]

---

## .center[From "wide" to "long" with `pivot_longer()`]

---

## .center[From "wide" to "long" with `pivot_longer()`]

```r
head(fed_spend_wide)
```

]

```r
fed_spend_long <- fed_spend_wide %>%
    pivot_longer( 
*       names_to = "department",
*       values_to = "rd_budget_mil",
*       cols = DOD:Other)

head(fed_spend_long)
```

```
#> # A tibble: 6 × 3
#>    year department rd_budget_mil
#>   <dbl> <chr>              <dbl>
#> 1  1976 DOD                35696
#> 2  1976 NASA               12513
#> 3  1976 DOE                10882
#> 4  1976 HHS                 9226
#> 5  1976 NIH                 8025
#> 6  1976 NSF                 2372
```

]

---

## Can also set `cols` by selecting which columns _not_ to use

```r
names(fed_spend_wide)
```

```
#>  [1] "year"     "DOD"      "NASA"     "DOE"      "HHS"      "NIH"      "NSF"      "USDA"     "Interior" "DOT"      "EPA"      "DOC"      "DHS"      "VA"       "Other"
```

]

```r
fed_spend_long <- fed_spend_wide %>%
    pivot_longer(
        names_to = "department", 
        values_to = "rd_budget_mil",
*       cols = -year)

head(fed_spend_long)
```

]

---

# Your turn: Reshaping Data

Open the `practice.Rmd` file.

Run the code chunk to read in the following two data files:

- `pv_cell_production.xlsx`: Data on solar photovoltaic cell production by country
- `milk_production.csv`: Data on milk production by state

Now modify the format of each:

- If the data are in "wide" format, convert it to "long" with `pivot_longer()`
- If the data are in "long" format, convert it to "wide" with `pivot_wider()`

---

# .fancy[Break]

---

# Week 9: .fancy[Uncertainty]

### 1. Computing uncertainty
### 2. Reshaping data

### BREAK

### 3. .orange[Cleaning pilot data]
### 4. Estimating pilot data models

---

## Download the [formr4conjoint](https://github.com/jhelvy/formr4conjoint) repo from GitHub

---

# .center[Cleaning formr survey data]

<br>

## 1. Open `formr4conjoint.Rproj`

## 2. Open `code/data_cleaning.R`

]

---

# Your Turn

## As a team, pick up where you left off last week and create a `choiceData` data frame in a "long" format

---

# Week 9: .fancy[Uncertainty]

### 1. Computing uncertainty
### 2. Reshaping data

### BREAK

### 3. Cleaning pilot data
### 4. .orange[Estimating pilot data models]

---

# .center[Estimating pilot data models]

<br>

## 1. Open `formr4conjoint.Rproj`

## 2. Open `code/modeling.R`

]

---

# Your Turn

## As a team:

1. Use your `choiceData` data frame to estimate preliminary choice models.  
2. Interpret your model coefficients with uncertainty.