Optimization & MLE

]

# Week 8: .fancy[Optimization & MLE]

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M243.4 2.587C251.4-.8625 260.6-.8625 268.6 2.587L492.6 98.59C506.6 104.6 514.4 119.6 511.3 134.4C508.3 149.3 495.2 159.1 479.1 160V168C479.1 181.3 469.3 192 455.1 192H55.1C42.74 192 31.1 181.3 31.1 168V160C16.81 159.1 3.708 149.3 .6528 134.4C-2.402 119.6 5.429 104.6 19.39 98.59L243.4 2.587zM256 128C273.7 128 288 113.7 288 96C288 78.33 273.7 64 256 64C238.3 64 224 78.33 224 96C224 113.7 238.3 128 256 128zM127.1 416H167.1V224H231.1V416H280V224H344V416H384V224H448V420.3C448.6 420.6 449.2 420.1 449.8 421.4L497.8 453.4C509.5 461.2 514.7 475.8 510.6 489.3C506.5 502.8 494.1 512 480 512H31.1C17.9 512 5.458 502.8 1.372 489.3C-2.715 475.8 2.515 461.2 14.25 453.4L62.25 421.4C62.82 420.1 63.41 420.6 63.1 420.3V224H127.1V416z"/></svg> EMSE 6035: Marketing Analytics for Design Decisions
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M224 256c70.7 0 128-57.31 128-128s-57.3-128-128-128C153.3 0 96 57.31 96 128S153.3 256 224 256zM274.7 304H173.3C77.61 304 0 381.6 0 477.3c0 19.14 15.52 34.67 34.66 34.67h378.7C432.5 512 448 496.5 448 477.3C448 381.6 370.4 304 274.7 304z"/></svg> John Paul Helveston
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M96 32C96 14.33 110.3 0 128 0C145.7 0 160 14.33 160 32V64H288V32C288 14.33 302.3 0 320 0C337.7 0 352 14.33 352 32V64H400C426.5 64 448 85.49 448 112V160H0V112C0 85.49 21.49 64 48 64H96V32zM448 464C448 490.5 426.5 512 400 512H48C21.49 512 0 490.5 0 464V192H448V464z"/></svg> October 19, 2022

]

---

# Week 8: .fancy[Optimization & MLE]

### 1. Maximum likelihood estimation
### 2. Optimization (in general)

### BREAK

### 3. Joins
### 4. Pilot data cleaning

---

# Week 8: .fancy[Optimization & MLE]

### 1. .orange[Maximum likelihood estimation]
### 2. Optimization (in general)

### BREAK

### 3. Joins
### 4. Pilot data cleaning

---

background-color: #EEEDEE
## .center[Computing the likelihood]

]

`$x$`: an observation

`$f(x)$`: probability of observing `$x$`

]

---

background-color: #EEEDEE

## .center[Computing the likelihood]

]

`$x$`: an observation

`$f(x)$`: probability of observing `$x$`

`$\mathcal{L}(\theta | x)$`: probability that `$\theta$` are the true parameters, given that observed `$x$`

**We want to estimate `$\theta$`**

]

---

background-color: #EEEDEE
class: center

## We actually compute the _log_-likelihood<br>(converts multiplication to addition)

---

# Practice Question 1

**Observations** - Height of students (inches):

```
#>  [1] 65 69 66 67 68 72 68 69 63 70
```

a) Let's say we know that the height of students, `$\tilde{x}$`, in a classroom follows a normal distribution. A professor obtains the above height measurements students in her classroom. What is the log-likelihood that `$\tilde{x} \sim \mathcal{N} (68, 4)$`? In other words, compute `$\ln \mathcal{L} (\mu = 68, \sigma = 4)$`.

b) Compute the log-likelihood function using the same standard deviation `$(\sigma = 4)$` but with the following different values for the mean, `$\mu: 66, 67, 68, 69, 70$`. How do the results compare? Which value for `$\mu$` produces the highest log-likelihood?

---

# Week 8: .fancy[Optimization & MLE]

### 1. Maximum likelihood estimation
### 2. .orange[Optimization (in general)]

### BREAK

### 3. Joins
### 4. Pilot data cleaning

---

background-color: #EEEDEE
class: center, middle

# `$f(x)$`

]

]

---

background-color: #EEEDEE
class: center, middle

]

]

---

background-color: #EEEDEE
class: center, middle

]

]

---

background-color: #EEEDEE
class: center, middle

---

background-color: #EEEDEE
class: center, middle

---

# Practice Question 2

Consider the following function:

`$$f(x) = x^2 - 6x$$`

The gradient is:

`$$\nabla f(x) = 2x - 6$$`

Using the starting point `$x = 1$` and the step size `$\gamma =  0.3$`, apply the gradient descent method to compute the next **three** points in the search algorithm.

]

---

background-color: #EEEDEE
class: center, middle

---

# Practice Question 3

Consider the following function:

$$
f(\underline{x}) = x_1^2 + 4x_2^2
$$

The gradient is:

$$
\nabla f(\underline{x}) =
`\begin{bmatrix}
2x_1
\\
8x_2
\end{bmatrix}`
$$

Using the starting point `$\underline{x}_0 = [1, 1]$` and the step size `$\gamma =  0.15$`, apply the gradient descent method to compute the next **three** points in the search algorithm.

]

---

## Download the [logitr-cars](https://github.com/emse-madd-gwu/logitr-cars) repo from GitHub

---

# .center[Estimating utility models]

<br>

## 1. Open `logitr-cars.Rproj`

## 2. Open `code/3.1-model-mnl.R`

]

---

background-color: #EEEDEE

# Maximum likelihood estimation

]

]

---

# .fancy[Break]

---

# Week 8: .fancy[Optimization & MLE]

### 1. Maximum likelihood estimation
### 2. Optimization (in general)

### BREAK

### 3. .orange[Joins]
### 4. Pilot data cleaning

---

## What's wrong with this map?

---

### Likely culprit: Merging two columns

```r
head(names)
```

```
#>              state_name
#> 1               Alabama
#> 2                Alaska
#> 3               Arizona
#> 4              Arkansas
#> 5   Armed Forces Africa
#> 6 Armed Forces Americas
```

```r
head(abbs)
```

```
#>   state_abb
#> 1        AA
#> 2        AE
#> 3        AE
#> 4        AE
#> 5        AE
#> 6        AK
```

]

```r
result <- cbind(names, abbs)
head(result)
```

```
#>              state_name state_abb
#> 1               Alabama        AA
#> 2                Alaska        AE
#> 3               Arizona        AE
#> 4              Arkansas        AE
#> 5   Armed Forces Africa        AE
#> 6 Armed Forces Americas        AK
```

]

---

## Joins

1. `inner_join()`
2. `left_join()` / `right_join()`
3. `full_join()`

&zwj;Example: `band_members` & `band_instruments`

```r
band_members
```

```
#> # A tibble: 3 × 2
#>   name  band   
#>   <chr> <chr>  
#> 1 Mick  Stones 
#> 2 John  Beatles
#> 3 Paul  Beatles
```

]

```r
band_instruments
```

```
#> # A tibble: 3 × 2
#>   name  plays 
#>   <chr> <chr> 
#> 1 John  guitar
#> 2 Paul  bass  
#> 3 Keith guitar
```

]

---

## `inner_join()`

```r
band_members %>%
    inner_join(band_instruments)
```

```
#> # A tibble: 2 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 John  Beatles guitar
#> 2 Paul  Beatles bass
```

]

]

---

## `full_join()`

```r
band_members %>%
    full_join(band_instruments)
```

```
#> # A tibble: 4 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 Mick  Stones  <NA>  
#> 2 John  Beatles guitar
#> 3 Paul  Beatles bass  
#> 4 Keith <NA>    guitar
```

]

]

---

## `left_join()`

```r
band_members %>%
    left_join(band_instruments)
```

```
#> # A tibble: 3 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 Mick  Stones  <NA>  
#> 2 John  Beatles guitar
#> 3 Paul  Beatles bass
```

]

]

---

## `right_join()`

```r
band_members %>%
    right_join(band_instruments)
```

```
#> # A tibble: 3 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 John  Beatles guitar
#> 2 Paul  Beatles bass  
#> 3 Keith <NA>    guitar
```

]

]

---

## Specify the joining variable name

```r
band_members %>%
    left_join(band_instruments)
```

```
#> Joining, by = "name"
```

```
#> # A tibble: 3 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 Mick  Stones  <NA>  
#> 2 John  Beatles guitar
#> 3 Paul  Beatles bass
```

]

```r
band_members %>%
    left_join(band_instruments,
*             by = 'name')
```

```
#> # A tibble: 3 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 Mick  Stones  <NA>  
#> 2 John  Beatles guitar
#> 3 Paul  Beatles bass
```

]

---

## Specify the joining variable name

If the names differ, use `by = c("left_name" = "joining_name")`

```r
band_members
```

```
#> # A tibble: 3 × 2
#>   name  band   
#>   <chr> <chr>  
#> 1 Mick  Stones 
#> 2 John  Beatles
#> 3 Paul  Beatles
```

```r
band_instruments2
```

```
#> # A tibble: 3 × 2
#>   artist plays 
#>   <chr>  <chr> 
#> 1 John   guitar
#> 2 Paul   bass  
#> 3 Keith  guitar
```

]

```r
band_members %>%
    left_join(band_instruments2,
*             by = c("name" = "artist"))
```

```
#> # A tibble: 3 × 3
#>   name  band    plays 
#>   <chr> <chr>   <chr> 
#> 1 Mick  Stones  <NA>  
#> 2 John  Beatles guitar
#> 3 Paul  Beatles bass
```

]

---

## Specify the joining variable name

Or just rename the joining variable in a pipe

```r
band_members
```

```
#> # A tibble: 3 × 2
#>   name  band   
#>   <chr> <chr>  
#> 1 Mick  Stones 
#> 2 John  Beatles
#> 3 Paul  Beatles
```

```r
band_instruments2
```

```
#> # A tibble: 3 × 2
#>   artist plays 
#>   <chr>  <chr> 
#> 1 John   guitar
#> 2 Paul   bass  
#> 3 Keith  guitar
```

]

```r
band_members %>%
*   rename(artist = name) %>%
    left_join(band_instruments2,
*             by = "artist")
```

```
#> # A tibble: 3 × 3
#>   artist band    plays 
#>   <chr>  <chr>   <chr> 
#> 1 Mick   Stones  <NA>  
#> 2 John   Beatles guitar
#> 3 Paul   Beatles bass
```

]

---
class: inverse

## Your turn

1. Write code to read in the `state_abbs.csv` and `state_regions.csv` data files in the "data" folder.
2. Create a new data frame called `states` by joining the two data frames `states_abbs` and `state_regions` together. The result should be a data frame with variables `region`, `name`, `abb`.

]

Your result should look like this:

```r
head(states)
```

```
#> # A tibble: 6 × 3
#>   region    name          abb  
#>   <chr>     <chr>         <chr>
#> 1 Northeast Maine         ME   
#> 2 Northeast New Hampshire NH   
#> 3 Northeast Vermont       VT   
#> 4 Northeast Massachusetts MA   
#> 5 Northeast Rhode Island  RI   
#> 6 Northeast Connecticut   CT
```

]

---

# Week 8: .fancy[Optimization & MLE]

### 1. Maximum likelihood estimation
### 2. Optimization (in general)

### BREAK

### 3. Joins
### 4. .orange[Pilot data cleaning]

---

## Download the [formr4conjoint](https://github.com/jhelvy/formr4conjoint) repo from GitHub

---

# .center[Cleaning formr survey data]

<br>

## 1. Open `formr4conjoint.Rproj`

## 2. Open `code/data_cleaning.R`

]

---

## Team time

### For the rest of class, work with your team mates to start importing and cleaning your pilot survey data

]