Optimization & MLE

]

# Week 8: .fancy[Optimization & MLE]

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M243.4 2.6l-224 96c-14 6-21.8 21-18.7 35.8S16.8 160 32 160v8c0 13.3 10.7 24 24 24H456c13.3 0 24-10.7 24-24v-8c15.2 0 28.3-10.7 31.3-25.6s-4.8-29.9-18.7-35.8l-224-96c-8-3.4-17.2-3.4-25.2 0zM128 224H64V420.3c-.6 .3-1.2 .7-1.8 1.1l-48 32c-11.7 7.8-17 22.4-12.9 35.9S17.9 512 32 512H480c14.1 0 26.5-9.2 30.6-22.7s-1.1-28.1-12.9-35.9l-48-32c-.6-.4-1.2-.7-1.8-1.1V224H384V416H344V224H280V416H232V224H168V416H128V224zM256 64a32 32 0 1 1 0 64 32 32 0 1 1 0-64z"/></svg> EMSE 6035: Marketing Analytics for Design Decisions
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M304 128a80 80 0 1 0 -160 0 80 80 0 1 0 160 0zM96 128a128 128 0 1 1 256 0A128 128 0 1 1 96 128zM49.3 464H398.7c-8.9-63.3-63.3-112-129-112H178.3c-65.7 0-120.1 48.7-129 112zM0 482.3C0 383.8 79.8 304 178.3 304h91.4C368.2 304 448 383.8 448 482.3c0 16.4-13.3 29.7-29.7 29.7H29.7C13.3 512 0 498.7 0 482.3z"/></svg> John Paul Helveston
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M152 24c0-13.3-10.7-24-24-24s-24 10.7-24 24V64H64C28.7 64 0 92.7 0 128v16 48V448c0 35.3 28.7 64 64 64H384c35.3 0 64-28.7 64-64V192 144 128c0-35.3-28.7-64-64-64H344V24c0-13.3-10.7-24-24-24s-24 10.7-24 24V64H152V24zM48 192H400V448c0 8.8-7.2 16-16 16H64c-8.8 0-16-7.2-16-16V192z"/></svg> October 16, 2024

]

---

# Quiz 3

### Download the template from the #class channel

### Make sure you unzip it!

### When done, submit your `quiz3.qmd` on Blackboard

]

]

---

# Week 8: .fancy[Optimization & MLE]

### 1. Maximum likelihood estimation
### 2. Optimization (in general)

### BREAK

### 3. Joins
### 4. Pilot data cleaning

---

# Week 8: .fancy[Optimization & MLE]

### 1. .orange[Maximum likelihood estimation]
### 2. Optimization (in general)

### BREAK

### 3. Joins
### 4. Pilot data cleaning

---

background-color: #EEEDEE

## .center[Computing the likelihood]

]

`$x$`: an observation

`$f(x)$`: probability of observing `$x$`

]

---

background-color: #EEEDEE

## .center[Computing the likelihood]

]

`$x$`: an observation

`$f(x)$`: probability of observing `$x$`

`$\mathcal{L}(\theta | x)$`: probability that `$\theta$` are the true parameters, given that observed `$x$`

**We want to estimate `$\theta$`**

]

---

background-color: #EEEDEE
class: center

## We actually compute the _log_-likelihood<br>(converts multiplication to addition)

---

# Practice Question 1

**Observations** - Height of students (inches):

```
#>  [1] 65 69 66 67 68 72 68 69 63 70
```

a) Let's say we know that the height of students, `$\tilde{x}$`, in a classroom follows a normal distribution. A professor obtains the above height measurements students in her classroom. What is the log-likelihood that `$\tilde{x} \sim \mathcal{N} (68, 4)$`? In other words, compute `$\ln \mathcal{L} (\mu = 68, \sigma = 4)$`.

b) Compute the log-likelihood function using the same standard deviation `$(\sigma = 4)$` but with the following different values for the mean, `$\mu: 66, 67, 68, 69, 70$`. How do the results compare? Which value for `$\mu$` produces the highest log-likelihood?

---

# Week 8: .fancy[Optimization & MLE]

### 1. Maximum likelihood estimation
### 2. .orange[Optimization (in general)]

### BREAK

### 3. Joins
### 4. Pilot data cleaning

---

background-color: #EEEDEE
class: center, middle

# `$f(x)$`

]

]

---

background-color: #EEEDEE
class: center, middle

]

]

---

background-color: #EEEDEE
class: center, middle

]

]

---

background-color: #EEEDEE
class: center, middle

---

background-color: #EEEDEE
class: center, middle

---

# Practice Question 2

Consider the following function:

`$$f(x) = x^2 - 6x$$`

The gradient is:

`$$\nabla f(x) = 2x - 6$$`

Using the starting point `$x = 1$` and the step size `$\gamma =  0.3$`, apply the gradient descent method to compute the next **three** points in the search algorithm.

]

---

background-color: #EEEDEE
class: center, middle

---

# Practice Question 3

Consider the following function:

$$
f(\underline{x}) = x_1^2 + 4x_2^2
$$

The gradient is:

$$
\nabla f(\underline{x}) =
`\begin{bmatrix}
2x_1
\\
8x_2
\end{bmatrix}`
$$

Using the starting point `$\underline{x}_0 = [1, 1]$` and the step size `$\gamma =  0.15$`, apply the gradient descent method to compute the next **three** points in the search algorithm.

]

---

## Download the [logitr-cars](https://github.com/jhelvy/logitr-cars/archive/refs/heads/main.zip) repo from GitHub

---

# .center[Estimating utility models]

<br>

## 1. Open `logitr-cars.Rproj`

## 2. Open `code/3.1-model-mnl.R`

]

---

background-color: #EEEDEE

# Maximum likelihood estimation

]

]

---

# .fancy[Break]

---

# Week 8: .fancy[Optimization & MLE]

### 1. Maximum likelihood estimation
### 2. Optimization (in general)

### BREAK

### 3. .orange[Joins]
### 4. Pilot data cleaning

---

## What's wrong with this map?

---

### Likely culprit: Merging two columns

``` r
head(names)
```

```
#>              state_name
#> 1               Alabama
#> 2                Alaska
#> 3               Arizona
#> 4              Arkansas
#> 5   Armed Forces Africa
#> 6 Armed Forces Americas
```

``` r
head(abbs)
```

```
#>   state_abb
#> 1        AA
#> 2        AE
#> 3        AE
#> 4        AE
#> 5        AE
#> 6        AK
```

]

``` r
result <- cbind(names, abbs)
head(result)
```

```
#>              state_name state_abb
#> 1               Alabama        AA
#> 2                Alaska        AE
#> 3               Arizona        AE
#> 4              Arkansas        AE
#> 5   Armed Forces Africa        AE
#> 6 Armed Forces Americas        AK
```

]

---

## Joins

1. `inner_join()`
2. `left_join()` / `right_join()`
3. `full_join()`

&zwj;Example: `band_members` & `band_instruments`

``` r
band_members
```

```
#> # A tibble: 3 × 2
#>   name  band   
#>   <chr> <chr>  
#> 1 Mick  Stones 
#> 2 John  Beatles
#> 3 Paul  Beatles
```

]

``` r
band_instruments
```

```
#> # A tibble: 3 × 2
#>   name  plays 
#>   <chr> <chr> 
#> 1 John  guitar
#> 2 Paul  bass  
#> 3 Keith guitar
```

]

---

## `inner_join()`

``` r
band_members %>%
    inner_join(band_instruments)
```

```
#> # A tibble: 2 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 John  Beatles guitar
#> 2 Paul  Beatles bass
```

]

]

---

## `full_join()`

``` r
band_members %>%
    full_join(band_instruments)
```

```
#> # A tibble: 4 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 Mick  Stones  <NA>  
#> 2 John  Beatles guitar
#> 3 Paul  Beatles bass  
#> 4 Keith <NA>    guitar
```

]

]

---

## `left_join()`

``` r
band_members %>%
    left_join(band_instruments)
```

```
#> # A tibble: 3 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 Mick  Stones  <NA>  
#> 2 John  Beatles guitar
#> 3 Paul  Beatles bass
```

]

]

---

## `right_join()`

``` r
band_members %>%
    right_join(band_instruments)
```

```
#> # A tibble: 3 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 John  Beatles guitar
#> 2 Paul  Beatles bass  
#> 3 Keith <NA>    guitar
```

]

]

---

## Specify the joining variable name

``` r
band_members %>%
    left_join(band_instruments)
```

```
#> Joining with `by = join_by(name)`
```

```
#> # A tibble: 3 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 Mick  Stones  <NA>  
#> 2 John  Beatles guitar
#> 3 Paul  Beatles bass
```

]

``` r
band_members %>%
    left_join(band_instruments,
*             by = 'name')
```

```
#> # A tibble: 3 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 Mick  Stones  <NA>  
#> 2 John  Beatles guitar
#> 3 Paul  Beatles bass
```

]

---

## Specify the joining variable name

If the names differ, use `by = c("left_name" = "joining_name")`

``` r
band_members
```

```
#> # A tibble: 3 × 2
#>   name  band   
#>   <chr> <chr>  
#> 1 Mick  Stones 
#> 2 John  Beatles
#> 3 Paul  Beatles
```

``` r
band_instruments2
```

```
#> # A tibble: 3 × 2
#>   artist plays 
#>   <chr>  <chr> 
#> 1 John   guitar
#> 2 Paul   bass  
#> 3 Keith  guitar
```

]

``` r
band_members %>%
    left_join(band_instruments2,
*             by = c("name" = "artist"))
```

```
#> # A tibble: 3 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 Mick  Stones  <NA>  
#> 2 John  Beatles guitar
#> 3 Paul  Beatles bass
```

]

---

## Specify the joining variable name

Or just rename the joining variable in a pipe

``` r
band_members
```

```
#> # A tibble: 3 × 2
#>   name  band   
#>   <chr> <chr>  
#> 1 Mick  Stones 
#> 2 John  Beatles
#> 3 Paul  Beatles
```

``` r
band_instruments2
```

```
#> # A tibble: 3 × 2
#>   artist plays 
#>   <chr>  <chr> 
#> 1 John   guitar
#> 2 Paul   bass  
#> 3 Keith  guitar
```

]

``` r
band_members %>%
*   rename(artist = name) %>%
    left_join(band_instruments2,
*             by = "artist")
```

```
#> # A tibble: 3 × 3
#>   artist band    plays 
#>   <chr>  <chr>   <chr> 
#> 1 Mick   Stones  <NA>  
#> 2 John   Beatles guitar
#> 3 Paul   Beatles bass
```

]

---

## Your turn

1) Create a new data frame called `state_data` by joining the `state_abbs` and `state_regions` data frames. The result should be a data frame with variables `state_name`, `state_abb`, and `state_region`. It should look like this:

``` r
head(state_data)
```

```
#> # A tibble: 6 × 3
#>   state_name state_abb state_region
#>   <chr>      <chr>     <chr>       
#> 1 Alabama    AL        Southeast   
#> 2 Alaska     AK        Pacific     
#> 3 Arizona    AZ        Mountain    
#> 4 Arkansas   AR        Delta States
#> 5 California CA        Pacific     
#> 6 Colorado   CO        Mountain
```

]]]

2) Join the `state_data` data frame to the `wildlife_impacts` data frame, adding the variables `state_region` and `state_name`.

``` r
glimpse(wildlife_impacts)
```

```
#> Rows: 56,978
#> Columns: 23
#> $ state_abb             <chr> "FL", "IN", NA, NA, NA, "FL", "FL", NA, NA, "FL", NA, "TX", NA, NA, "NY", NA, NA, "MD", "CA", "AZ", "NC", "TX", NA, NA, "CA", NA, NA, "NM", NA, NA, NA, NA, "CA", "NC", "FL", "FL", "CA", NA, "TX", "CA", "PA", NA, "TX", …
#> $ state_name            <chr> "Florida", "Indiana", NA, NA, NA, "Florida", "Florida", NA, NA, "Florida", NA, "Texas", NA, NA, "New York", NA, NA, "Maryland", "California", "Arizona", "North Carolina", "Texas", NA, NA, "California", NA, NA, "New Mex…
#> $ state_region          <chr> "Southeast", "Corn Belt", NA, NA, NA, "Southeast", "Southeast", NA, NA, "Southeast", NA, "Southern Plains", NA, NA, "Northeast", NA, NA, "Northeast", "Pacific", "Mountain", "Appalachian", "Southern Plains", NA, NA, "Pa…
#> $ incident_date         <dttm> 2018-12-31, 2018-12-29, 2018-12-29, 2018-12-27, 2018-12-27, 2018-12-27, 2018-12-27, 2018-12-26, 2018-12-23, 2018-12-23, 2018-12-23, 2018-12-22, 2018-12-22, 2018-12-22, 2018-12-22, 2018-12-22, 2018-12-21, 2018-12-21, 2…
#> $ airport_id            <chr> "KMIA", "KIND", "ZZZZ", "ZZZZ", "ZZZZ", "KMIA", "KMCO", "ZZZZ", "ZZZZ", "KFLL", "ZZZZ", "KGRK", "ZZZZ", "ZZZZ", "KJFK", "MDPP", "MNMG", "KBWI", "KSMF", "KPHX", "KCLT", "KDFW", "ZZZZ", "ZZZZ", "KSNA", "ZZZZ", "ZZZZ", "K…
#> $ airport               <chr> "MIAMI INTL", "INDIANAPOLIS INTL ARPT", "UNKNOWN", "UNKNOWN", "UNKNOWN", "MIAMI INTL", "ORLANDO INTL", "UNKNOWN", "UNKNOWN", "FORT LAUDERDALE/HOLLYWOOD INTL", "UNKNOWN", "KILLEEN/FT HOOD REGIONAL", "UNKNOWN", "UNKNOWN"…
#> $ operator              <chr> "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICA…
#> $ atype                 <chr> "B-737-800", "B-737-800", "UNKNOWN", "B-737-900", "B-737-800", "A-319", "A-321", "B-737-800", "A-321", "B-737-800", "B-737-800", "EMB-145", "A-319", "A-319", "B-737-800", "B-737-800", "B-737-800", "A-319", "A-319", "B-…
#> $ type_eng              <chr> "D", "D", NA, "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", NA, "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", …
#> $ species_id            <chr> "UNKBL", "R", "R2004", "N5205", "J2139", "UNKB", "UNKBS", "ZT001", "ZT101", "I1301", "UNKB", "O22", "ZX010", "ZX303", "K5114", "UNKBS", "UNKBS", "UNKB", "J2141", "UNKBS", "ZT101", "UNKB", "UNKB", "ZS009", "UNKB", "UNKB…
#> $ species               <chr> "Unknown bird - large", "Owls", "Short-eared owl", "Southern lapwing", "Lesser scaup", "Unknown bird", "Unknown bird - small", "Eastern meadowlark", "Red-winged blackbird", "Cattle egret", "Unknown bird", "Doves", "Pin…
#> $ damage                <chr> "M?", "N", NA, "M?", "M?", "N", "N", "N", "N", "N", "N", "N", "N", "N", NA, "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", NA, "N", NA, "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", NA, "N", "N", "N", NA, "…
#> $ num_engs              <dbl> 2, 2, NA, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
#> $ incident_month        <dbl> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11…
#> $ incident_year         <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 20…
#> $ time_of_day           <chr> "Day", "Night", NA, NA, NA, "Day", "Night", NA, NA, "Day", "Night", "Day", NA, NA, "Day", "Day", "Day", "Day", "Day", "Night", "Day", "Dawn", NA, NA, "Day", NA, NA, "Day", NA, NA, NA, NA, "Day", "Day", "Day", "Day", "D…
#> $ time                  <dbl> 1207, 2355, NA, NA, NA, 955, 948, NA, NA, 1321, 15, 1612, NA, NA, 905, 1457, 1418, 1628, 627, 2130, 719, 747, NA, NA, 1348, NA, NA, 1305, NA, NA, NA, NA, 1345, 944, 1400, 1415, 1150, 800, 1400, 1505, 1731, NA, NA, 1733…
#> $ height                <dbl> 700, 0, NA, NA, NA, NA, 600, NA, NA, 0, NA, 0, NA, NA, 0, 500, 100, 0, 1000, 4500, 300, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0, NA, 0, 100, 0, 3027, 0, NA, 0, NA, 200, NA, 0, 300, 8, NA, 500, NA, NA, NA, NA, NA, …
#> $ speed                 <dbl> 200, NA, NA, NA, NA, NA, 145, NA, NA, 130, NA, NA, NA, NA, NA, 160, 150, NA, NA, 250, NA, NA, NA, NA, NA, NA, NA, 160, NA, NA, NA, NA, 100, NA, NA, 100, 150, 130, 130, NA, 150, NA, 150, NA, NA, 140, 144, NA, 145, NA, N…
#> $ phase_of_flt          <chr> "Climb", "Landing Roll", NA, NA, NA, "Approach", "Approach", NA, NA, "Take-off run", NA, "Landing Roll", NA, NA, "Landing Roll", "Approach", "Approach", "Take-off run", "Climb", "Climb", "Approach", "Approach", NA, NA,…
#> $ sky                   <chr> "Some Cloud", NA, NA, NA, NA, NA, "Some Cloud", NA, NA, "No Cloud", NA, "Some Cloud", NA, NA, "Some Cloud", "No Cloud", "No Cloud", NA, "No Cloud", "No Cloud", "No Cloud", "Some Cloud", NA, NA, NA, NA, NA, "Some Cloud"…
#> $ precip                <chr> "None", NA, NA, NA, NA, NA, "None", NA, NA, "None", NA, "None", NA, NA, "None", "None", "None", NA, "None", "None", "None", "None", NA, NA, NA, NA, NA, "None", NA, NA, NA, NA, "None", "None", NA, "None", "None", "None"…
#> $ cost_repairs_infl_adj <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
```

]]]

---

# Week 8: .fancy[Optimization & MLE]

### 1. Maximum likelihood estimation
### 2. Optimization (in general)

### BREAK

### 3. Joins
### 4. .orange[Pilot data cleaning]

---

## Download the [demo-choice-based-conjoint](https://github.com/surveydown-dev/demo-choice-based-conjoint/archive/refs/heads/main.zip) repo

---

# .center[Cleaning surveydown survey data]

<br>

## 1. Open `survey.Rproj`

## 2. Open `code/data_cleaning.R`

]

---

## Team time

### For the rest of class, work with your team mates to start importing and cleaning your pilot survey data

]