class: middle, inverse .leftcol30[ <center> <img src="https://madd.seas.gwu.edu/images/logo.png" width=250> </center> ] .rightcol70[ # Week 8: .fancy[Optimization & MLE] ###
EMSE 6035: Marketing Analytics for Design Decisions ###
John Paul Helveston ###
October 16, 2024 ] --- class: inverse # Quiz 3
−
+
10
:
00
.leftcol[ ### Download the template from the #class channel ### Make sure you unzip it! ### When done, submit your `quiz3.qmd` on Blackboard ] .rightcol[ <center> <img src="https://github.com/emse-p4a-gwu/2022-Spring/raw/main/images/quiz_doge.png" width="400"> </center> ] --- class: inverse, middle # Week 8: .fancy[Optimization & MLE] ### 1. Maximum likelihood estimation ### 2. Optimization (in general) ### BREAK ### 3. Joins ### 4. Pilot data cleaning --- class: inverse, middle # Week 8: .fancy[Optimization & MLE] ### 1. .orange[Maximum likelihood estimation] ### 2. Optimization (in general) ### BREAK ### 3. Joins ### 4. Pilot data cleaning --- background-color: #EEEDEE ## .center[Computing the likelihood] .leftcol[ <center> <img src="images/pdf.png" width=100%> </center> ] .rightcol[ `\(x\)`: an observation `\(f(x)\)`: probability of observing `\(x\)` ] --- background-color: #EEEDEE ## .center[Computing the likelihood] .leftcol[ <center> <img src="images/pdf.png" width=100%> </center> ] .rightcol[ `\(x\)`: an observation `\(f(x)\)`: probability of observing `\(x\)` `\(\mathcal{L}(\theta | x)\)`: probability that `\(\theta\)` are the true parameters, given that observed `\(x\)` **We want to estimate `\(\theta\)`** ] --- background-color: #EEEDEE class: center ## We actually compute the _log_-likelihood<br>(converts multiplication to addition) <center> <img src="images/logl.png" width=700> </center> --- class: inverse # Practice Question 1 **Observations** - Height of students (inches): ``` #> [1] 65 69 66 67 68 72 68 69 63 70 ``` a) Let's say we know that the height of students, `\(\tilde{x}\)`, in a classroom follows a normal distribution. A professor obtains the above height measurements students in her classroom. What is the log-likelihood that `\(\tilde{x} \sim \mathcal{N} (68, 4)\)`? In other words, compute `\(\ln \mathcal{L} (\mu = 68, \sigma = 4)\)`. -- b) Compute the log-likelihood function using the same standard deviation `\((\sigma = 4)\)` but with the following different values for the mean, `\(\mu: 66, 67, 68, 69, 70\)`. How do the results compare? Which value for `\(\mu\)` produces the highest log-likelihood? --- class: inverse, middle # Week 8: .fancy[Optimization & MLE] ### 1. Maximum likelihood estimation ### 2. .orange[Optimization (in general)] ### BREAK ### 3. Joins ### 4. Pilot data cleaning --- background-color: #EEEDEE class: center, middle .leftcol40[ # `\(f(x)\)` ] .rightcol60[ <center> <img src="images/fx.png" width=100%> </center> ] --- background-color: #EEEDEE class: center, middle .leftcol40[ <center> <img src="images/first_order.png" width=100%> </center> ] .rightcol60[ <center> <img src="images/fx.png" width=100%> </center> ] --- background-color: #EEEDEE class: center, middle .leftcol40[ <center> <img src="images/second_order.png" width=100%> </center> ] .rightcol60[ <center> <img src="images/fx.png" width=100%> </center> ] --- background-color: #EEEDEE class: center, middle <center> <img src="images/conditions.png" width=1000> </center> --- background-color: #EEEDEE class: center, middle <center> <img src="images/algorithms.png" width=1200> </center> --- class: inverse # Practice Question 2 .leftcol80[ Consider the following function: `$$f(x) = x^2 - 6x$$` The gradient is: `$$\nabla f(x) = 2x - 6$$` Using the starting point `\(x = 1\)` and the step size `\(\gamma = 0.3\)`, apply the gradient descent method to compute the next **three** points in the search algorithm. ] --- background-color: #EEEDEE class: center, middle <center> <img src="images/conditions.png" width=1000> </center> --- class: inverse # Practice Question 3 .leftcol80[ Consider the following function: $$ f(\underline{x}) = x_1^2 + 4x_2^2 $$ The gradient is: $$ \nabla f(\underline{x}) = `\begin{bmatrix} 2x_1 \\ 8x_2 \end{bmatrix}` $$ Using the starting point `\(\underline{x}_0 = [1, 1]\)` and the step size `\(\gamma = 0.15\)`, apply the gradient descent method to compute the next **three** points in the search algorithm. ] --- class: center, middle ## Download the [logitr-cars](https://github.com/jhelvy/logitr-cars/archive/refs/heads/main.zip) repo from GitHub --- # .center[Estimating utility models] <br> .rightcol80[ ## 1. Open `logitr-cars.Rproj` ## 2. Open `code/3.1-model-mnl.R` ] --- background-color: #EEEDEE # Maximum likelihood estimation .leftcol[ <center> <img src="images/mle1.png" width=100%> </center> ] .rightcol[ <center> <img src="images/mle2.png" width=100%> </center> ] --- class: inverse, center # .fancy[Break]
−
+
05
:
00
--- class: inverse, middle # Week 8: .fancy[Optimization & MLE] ### 1. Maximum likelihood estimation ### 2. Optimization (in general) ### BREAK ### 3. .orange[Joins] ### 4. Pilot data cleaning --- class: center ## What's wrong with this map? <center> <img src="images/join_fail.png" height=500> </center> --- ### Likely culprit: Merging two columns .leftcol[ ``` r head(names) ``` ``` #> state_name #> 1 Alabama #> 2 Alaska #> 3 Arizona #> 4 Arkansas #> 5 Armed Forces Africa #> 6 Armed Forces Americas ``` ``` r head(abbs) ``` ``` #> state_abb #> 1 AA #> 2 AE #> 3 AE #> 4 AE #> 5 AE #> 6 AK ``` ] -- .rightcol[ ``` r result <- cbind(names, abbs) head(result) ``` ``` #> state_name state_abb #> 1 Alabama AA #> 2 Alaska AE #> 3 Arizona AE #> 4 Arkansas AE #> 5 Armed Forces Africa AE #> 6 Armed Forces Americas AK ``` ] --- ## Joins 1. `inner_join()` 2. `left_join()` / `right_join()` 3. `full_join()` -- ‍Example: `band_members` & `band_instruments` .leftcol[ ``` r band_members ``` ``` #> # A tibble: 3 × 2 #> name band #> <chr> <chr> #> 1 Mick Stones #> 2 John Beatles #> 3 Paul Beatles ``` ] .rightcol[ ``` r band_instruments ``` ``` #> # A tibble: 3 × 2 #> name plays #> <chr> <chr> #> 1 John guitar #> 2 Paul bass #> 3 Keith guitar ``` ] --- .leftcol[ ## `inner_join()` ``` r band_members %>% inner_join(band_instruments) ``` ``` #> # A tibble: 2 × 3 #> name band plays #> <chr> <chr> <chr> #> 1 John Beatles guitar #> 2 Paul Beatles bass ``` ] .rightcol[ <br> <center> <img src="images/inner_join.gif"> </center> ] --- .leftcol[ ## `full_join()` ``` r band_members %>% full_join(band_instruments) ``` ``` #> # A tibble: 4 × 3 #> name band plays #> <chr> <chr> <chr> #> 1 Mick Stones <NA> #> 2 John Beatles guitar #> 3 Paul Beatles bass #> 4 Keith <NA> guitar ``` ] .rightcol[ <br> <center> <img src="images/full_join.gif"> </center> ] --- .leftcol[ ## `left_join()` ``` r band_members %>% left_join(band_instruments) ``` ``` #> # A tibble: 3 × 3 #> name band plays #> <chr> <chr> <chr> #> 1 Mick Stones <NA> #> 2 John Beatles guitar #> 3 Paul Beatles bass ``` ] .rightcol[ <br> <center> <img src="images/left_join.gif"> </center> ] --- .leftcol[ ## `right_join()` ``` r band_members %>% right_join(band_instruments) ``` ``` #> # A tibble: 3 × 3 #> name band plays #> <chr> <chr> <chr> #> 1 John Beatles guitar #> 2 Paul Beatles bass #> 3 Keith <NA> guitar ``` ] .rightcol[ <br> <center> <img src="images/right_join.gif"> </center> ] --- ## Specify the joining variable name .leftcol[ ``` r band_members %>% left_join(band_instruments) ``` ``` #> Joining with `by = join_by(name)` ``` ``` #> # A tibble: 3 × 3 #> name band plays #> <chr> <chr> <chr> #> 1 Mick Stones <NA> #> 2 John Beatles guitar #> 3 Paul Beatles bass ``` ] -- .rightcol[ ``` r band_members %>% left_join(band_instruments, * by = 'name') ``` ``` #> # A tibble: 3 × 3 #> name band plays #> <chr> <chr> <chr> #> 1 Mick Stones <NA> #> 2 John Beatles guitar #> 3 Paul Beatles bass ``` ] --- ## Specify the joining variable name If the names differ, use `by = c("left_name" = "joining_name")` -- .leftcol[ ``` r band_members ``` ``` #> # A tibble: 3 × 2 #> name band #> <chr> <chr> #> 1 Mick Stones #> 2 John Beatles #> 3 Paul Beatles ``` ``` r band_instruments2 ``` ``` #> # A tibble: 3 × 2 #> artist plays #> <chr> <chr> #> 1 John guitar #> 2 Paul bass #> 3 Keith guitar ``` ] -- .rightcol[ ``` r band_members %>% left_join(band_instruments2, * by = c("name" = "artist")) ``` ``` #> # A tibble: 3 × 3 #> name band plays #> <chr> <chr> <chr> #> 1 Mick Stones <NA> #> 2 John Beatles guitar #> 3 Paul Beatles bass ``` ] --- ## Specify the joining variable name Or just rename the joining variable in a pipe .leftcol[ ``` r band_members ``` ``` #> # A tibble: 3 × 2 #> name band #> <chr> <chr> #> 1 Mick Stones #> 2 John Beatles #> 3 Paul Beatles ``` ``` r band_instruments2 ``` ``` #> # A tibble: 3 × 2 #> artist plays #> <chr> <chr> #> 1 John guitar #> 2 Paul bass #> 3 Keith guitar ``` ] .rightcol[ ``` r band_members %>% * rename(artist = name) %>% left_join(band_instruments2, * by = "artist") ``` ``` #> # A tibble: 3 × 3 #> artist band plays #> <chr> <chr> <chr> #> 1 Mick Stones <NA> #> 2 John Beatles guitar #> 3 Paul Beatles bass ``` ] --- class: inverse
−
+
15
:
00
## Your turn .leftcol[.font80[ 1) Create a new data frame called `state_data` by joining the `state_abbs` and `state_regions` data frames. The result should be a data frame with variables `state_name`, `state_abb`, and `state_region`. It should look like this: .code70[ ``` r head(state_data) ``` ``` #> # A tibble: 6 × 3 #> state_name state_abb state_region #> <chr> <chr> <chr> #> 1 Alabama AL Southeast #> 2 Alaska AK Pacific #> 3 Arizona AZ Mountain #> 4 Arkansas AR Delta States #> 5 California CA Pacific #> 6 Colorado CO Mountain ``` ]]] .rightcol[.font80[ 2) Join the `state_data` data frame to the `wildlife_impacts` data frame, adding the variables `state_region` and `state_name`. .code50[ ``` r glimpse(wildlife_impacts) ``` ``` #> Rows: 56,978 #> Columns: 23 #> $ state_abb <chr> "FL", "IN", NA, NA, NA, "FL", "FL", NA, NA, "FL", NA, "TX", NA, NA, "NY", NA, NA, "MD", "CA", "AZ", "NC", "TX", NA, NA, "CA", NA, NA, "NM", NA, NA, NA, NA, "CA", "NC", "FL", "FL", "CA", NA, "TX", "CA", "PA", NA, "TX", … #> $ state_name <chr> "Florida", "Indiana", NA, NA, NA, "Florida", "Florida", NA, NA, "Florida", NA, "Texas", NA, NA, "New York", NA, NA, "Maryland", "California", "Arizona", "North Carolina", "Texas", NA, NA, "California", NA, NA, "New Mex… #> $ state_region <chr> "Southeast", "Corn Belt", NA, NA, NA, "Southeast", "Southeast", NA, NA, "Southeast", NA, "Southern Plains", NA, NA, "Northeast", NA, NA, "Northeast", "Pacific", "Mountain", "Appalachian", "Southern Plains", NA, NA, "Pa… #> $ incident_date <dttm> 2018-12-31, 2018-12-29, 2018-12-29, 2018-12-27, 2018-12-27, 2018-12-27, 2018-12-27, 2018-12-26, 2018-12-23, 2018-12-23, 2018-12-23, 2018-12-22, 2018-12-22, 2018-12-22, 2018-12-22, 2018-12-22, 2018-12-21, 2018-12-21, 2… #> $ airport_id <chr> "KMIA", "KIND", "ZZZZ", "ZZZZ", "ZZZZ", "KMIA", "KMCO", "ZZZZ", "ZZZZ", "KFLL", "ZZZZ", "KGRK", "ZZZZ", "ZZZZ", "KJFK", "MDPP", "MNMG", "KBWI", "KSMF", "KPHX", "KCLT", "KDFW", "ZZZZ", "ZZZZ", "KSNA", "ZZZZ", "ZZZZ", "K… #> $ airport <chr> "MIAMI INTL", "INDIANAPOLIS INTL ARPT", "UNKNOWN", "UNKNOWN", "UNKNOWN", "MIAMI INTL", "ORLANDO INTL", "UNKNOWN", "UNKNOWN", "FORT LAUDERDALE/HOLLYWOOD INTL", "UNKNOWN", "KILLEEN/FT HOOD REGIONAL", "UNKNOWN", "UNKNOWN"… #> $ operator <chr> "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICAN AIRLINES", "AMERICA… #> $ atype <chr> "B-737-800", "B-737-800", "UNKNOWN", "B-737-900", "B-737-800", "A-319", "A-321", "B-737-800", "A-321", "B-737-800", "B-737-800", "EMB-145", "A-319", "A-319", "B-737-800", "B-737-800", "B-737-800", "A-319", "A-319", "B-… #> $ type_eng <chr> "D", "D", NA, "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", NA, "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", … #> $ species_id <chr> "UNKBL", "R", "R2004", "N5205", "J2139", "UNKB", "UNKBS", "ZT001", "ZT101", "I1301", "UNKB", "O22", "ZX010", "ZX303", "K5114", "UNKBS", "UNKBS", "UNKB", "J2141", "UNKBS", "ZT101", "UNKB", "UNKB", "ZS009", "UNKB", "UNKB… #> $ species <chr> "Unknown bird - large", "Owls", "Short-eared owl", "Southern lapwing", "Lesser scaup", "Unknown bird", "Unknown bird - small", "Eastern meadowlark", "Red-winged blackbird", "Cattle egret", "Unknown bird", "Doves", "Pin… #> $ damage <chr> "M?", "N", NA, "M?", "M?", "N", "N", "N", "N", "N", "N", "N", "N", "N", NA, "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", NA, "N", NA, "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", NA, "N", "N", "N", NA, "… #> $ num_engs <dbl> 2, 2, NA, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,… #> $ incident_month <dbl> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11… #> $ incident_year <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 20… #> $ time_of_day <chr> "Day", "Night", NA, NA, NA, "Day", "Night", NA, NA, "Day", "Night", "Day", NA, NA, "Day", "Day", "Day", "Day", "Day", "Night", "Day", "Dawn", NA, NA, "Day", NA, NA, "Day", NA, NA, NA, NA, "Day", "Day", "Day", "Day", "D… #> $ time <dbl> 1207, 2355, NA, NA, NA, 955, 948, NA, NA, 1321, 15, 1612, NA, NA, 905, 1457, 1418, 1628, 627, 2130, 719, 747, NA, NA, 1348, NA, NA, 1305, NA, NA, NA, NA, 1345, 944, 1400, 1415, 1150, 800, 1400, 1505, 1731, NA, NA, 1733… #> $ height <dbl> 700, 0, NA, NA, NA, NA, 600, NA, NA, 0, NA, 0, NA, NA, 0, 500, 100, 0, 1000, 4500, 300, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0, NA, 0, 100, 0, 3027, 0, NA, 0, NA, 200, NA, 0, 300, 8, NA, 500, NA, NA, NA, NA, NA, … #> $ speed <dbl> 200, NA, NA, NA, NA, NA, 145, NA, NA, 130, NA, NA, NA, NA, NA, 160, 150, NA, NA, 250, NA, NA, NA, NA, NA, NA, NA, 160, NA, NA, NA, NA, 100, NA, NA, 100, 150, 130, 130, NA, 150, NA, 150, NA, NA, 140, 144, NA, 145, NA, N… #> $ phase_of_flt <chr> "Climb", "Landing Roll", NA, NA, NA, "Approach", "Approach", NA, NA, "Take-off run", NA, "Landing Roll", NA, NA, "Landing Roll", "Approach", "Approach", "Take-off run", "Climb", "Climb", "Approach", "Approach", NA, NA,… #> $ sky <chr> "Some Cloud", NA, NA, NA, NA, NA, "Some Cloud", NA, NA, "No Cloud", NA, "Some Cloud", NA, NA, "Some Cloud", "No Cloud", "No Cloud", NA, "No Cloud", "No Cloud", "No Cloud", "Some Cloud", NA, NA, NA, NA, NA, "Some Cloud"… #> $ precip <chr> "None", NA, NA, NA, NA, NA, "None", NA, NA, "None", NA, "None", NA, NA, "None", "None", "None", NA, "None", "None", "None", "None", NA, NA, NA, NA, NA, "None", NA, NA, NA, NA, "None", "None", NA, "None", "None", "None"… #> $ cost_repairs_infl_adj <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA… ``` ]]] --- class: inverse, middle # Week 8: .fancy[Optimization & MLE] ### 1. Maximum likelihood estimation ### 2. Optimization (in general) ### BREAK ### 3. Joins ### 4. .orange[Pilot data cleaning] --- class: center, middle ## Download the [demo-choice-based-conjoint](https://github.com/surveydown-dev/demo-choice-based-conjoint/archive/refs/heads/main.zip) repo --- # .center[Cleaning surveydown survey data] <br> .rightcol80[ ## 1. Open `survey.Rproj` ## 2. Open `code/data_cleaning.R` ] --- class: inverse ## Team time .leftcol80[ ### For the rest of class, work with your team mates to start importing and cleaning your pilot survey data ]