Mobile AI Functionality: Final Analysis

Final analysis

Published

December 8, 2024

Abstract

This study focuses on a market analysis for a mobile AI software designed and intended for personal users. The analysis seeks to use survey data to generate a utility model for this software with the following attributes being studied: price, functionality, and question limit. The definition for price is the cost per month of the software, for functionality is how many different functions (texting, calendar, social media, ordering food, etc) the AI software is able to perform, and for question limit is the amount of requests that a user can make per day. The survey solicitation yielded around 200 initial responses; after processing and cleaning, the analysis had ~180 respondents. Through logit modelling, willingness to pay calculations, market competitor comparisons, and sensitivity analysis, our product is highly price sensitive in the market. Additionally, due to the prevalence of competitors that offer low cost and high functionality models, a robust product offering would be necessary to achieve a market share greater than 10-20%.

Introduction

The product within this study is AI software technology on personal mobile phones. The purpose of this technology is to enhance a user’s experience with their personal device by enhancing its software intelligence and augmentation capabilities for tasks such as writing emails, generating ideas, managing notifications, performing tasks such as ride hailing, and much more. Figure 1 shows a high-level diagram of how the technology functions.

Figure 1: AI mobile software diagram


The attributes of AI software that are being investigated in this study are price, functionality, and question limit. These attributes were selected they are each things that other entries in the space advertise to consumers themselves. These other companies are 1) Apple Siri with Apple Intelligence, 2) ChatGPT, and 3) Google Gemini. At time of writing, Apple Intelligence is in beta and not yet available to customers [1]. These three competitors offer entries into the marketplace with Apple Intelligence being a free offering to current Apple device owners that is intended to enhance the integrated software-hardware experience, ChatGPT is a monthly subscription Large Language Model (LLM) that operates in a web browser or within applications, and Gemini from Google has multiple subscription levels ranging from free as part of their OS a paid subscription on other services that they offer such as Gmail.

Survey Design

Eligibility Requirements

The primary eligibility requirement for the survey was that respondents needed to be at least 18 years old, as reflected in the consent form. This requirement ensures that respondents are legally considered adults and capable of providing informed consent for participating in research. Additionally, participants were required to acknowledge their voluntary participation, which ensures ethical compliance in research studies. We also collected critical demographic information such as age, gender, income, race, and industry of employment, to understand the diversity of respondents and to control for these variables during analysis. These demographic questions help tailor insights to specific population segments and ensure the responses are relevant across various groups.

Educational Material Presented

To ensure respondents were adequately informed before answering the conjoint questions, we provided a section called “Education Information.” This included a detailed explanation of the key attributes of the AI software, such as Price, Functionality, and Question Limit. These attributes were broken down into simple terms so that respondents could understand the differences between the options. This educational material ensured that the respondents had the necessary context to make informed choices in the conjoint questions.

Attributes and Levels for Conjoint Choice Questions

The conjoint survey aimed to understand preferences for a hypothetical AI-enabled smartphone service. The three main attributes selected were:

1. Price: The amount you are willing to pay per month for the AI software service. Levels at $5, $10, and $25.

2. Functionality: The AI software can assist with or complete various tasks such as posting on social media, reading text messages, ordering food, making reservations, transcribing calls, and more.

The functionality of the AI is categorized into three levels: Low, Medium, and High.

  • Low functionality: Complete a task within one source (e.g., edit grammar of an email draft)

  • Medium functionality: Complete a task with up to three sources (e.g., check calendar to schedule a video call via email)

  • High functionality: Complete a task with access to all relevant sources (i.e., cross-reference calendar, email, messages, and a flight booking website to help book the best flight)

3. Question limit: The number of questions or requests that you can ask the AI software per day (e.g., each input, such as “can you revise this email?” counts as a question). Levels at 50, 250, and 500 questions per day.

These attributes and levels were chosen based on relevance to potential consumers of AI services and to capture trade-offs between cost and service functionality. Each question presented 3 alternatives (price, functionality, question limit combinations) for the respondent to choose from. There was a total of 8 choice questions per respondent to cover enough variability in preferences without overwhelming the participant.

No-Choice Option

A “no-choice” option was not explicitly included in this survey. All respondents were required to choose among the provided alternatives in each conjoint question. The absence of a no-choice option was deliberate to force participants to make a trade-off among the options, ensuring that their preferences were captured more clearly.

Example Conjoint Question

An example conjoint question from the survey would look like this:

“If these were your only options, which AI-enabled smartphone software service would you choose?”

  1. Price: $10
  2. Functionality: Low
  3. Question Limit: 50

The respondent would choose between several combinations of these attributes.

Find the below figure 2 for the screenshot of one of the conjoint questions from the survey.

Figure 2: Conjoint question example


This design approach ensures that respondents are adequately prepared, the choice questions are relevant and balanced, and meaningful insights into preferences for AI-enabled smartphone services can be derived.

Changes in Survey from Pilot

The only major change between the pilot survey and the final survey was to change the question limit value unlimited to 500. This was to allow the utility model to fit the question limit a continuous value rather that discrete.

Data Analysis

Sample description

The final survey involved a total of 221 respondents, with a potential of 1,326 conjoint question answers (6 questions per respondent). The target demographic of this study focused on smartphone users who rely on their devices for more than 50% of their daily tasks. Respondents for the final survey were sourced beyond the initial sample of classmates to ensure a broader representation within the defined target demographic.

Raw data summary

Below shows a summary table of the survey response demographic and participant detail data (before any processing or cleaning).

Attribute Total n # of unique responses Most common response # of most common response
Age 221 8 25-34 65
Gender 221 4 male 106
Income 221 9 25k_to_49k 44
Job 221 16 other 55
Literate 221 6 4 73
Table 1: Preprocessed respondent details


Data Cleaning

In the data cleaning process, several steps were taken to ensure that only valid and reliable responses were kept for further analysis. Before data cleaning, there were 221 responses and after cleaning, that came to 181 responses with a total of 1,086 usable conjoint question answers that could be used for developing a model and performing analysis. Here’s a detailed explanation of the filtering process applied:

  1. Initial Data Formatting and Time Calculation: First, we computed the total time each respondent took to complete the survey (time_total) as well as the specific time spent on the conjoint choice questions (time_cbc_total). This helped in identifying the pacing of respondents and spotting potential anomalies.

  2. Removing Incomplete Responses: We filtered out any respondents who did not complete all six choice questions. If any cbc_qX field was missing (NA), that respondent was excluded from the analysis. This ensured that all data used in the analysis came from fully completed surveys. After this step, the dataset was reduced to retain only those respondents who answered all required questions.

  3. Removing Responses with Identical Answers: Respondents who selected the same alternative for all choice questions were flagged and removed. The rationale behind this was to avoid keeping responses where participants might have been disengaged or providing careless answers (i.e., choosing the same option without thoughtful consideration). We created a cbc_all_same variable to track this, and after filtering out these respondents, we reduced the dataset further to only include those with varied and potentially more thoughtful answers.

  4. Filtering Based on Completion Time: We examined the time each respondent spent on the survey, particularly focusing on the total time (time_min_total) and the time spent on choice questions (time_min_cbc). Respondents who completed the survey too quickly were identified by calculating the 10th percentile of total survey completion time (time_10). Those who finished below this threshold were likely to have rushed through the survey, so they were excluded to maintain data quality. This helped in eliminating potentially low-effort or careless responses that might distort the results.

  5. Summary of Acceptable vs. Rejected Responses: After applying the above filters, a certain number of responses were deemed acceptable, while others were rejected based on the criteria:

    Rejected Responses: Respondents who did not complete all questions, gave identical answers, or finished the survey too quickly

    Accepted Responses: Those who provided complete answers, exhibited variability in their responses, and took a reasonable amount of time to finish the survey.

    The exact numbers of acceptable and rejected responses can be derived from the nrow() function used at various points in the code, but overall, only high-quality responses were kept for further analysis. The filtering processes, such as removing incomplete surveys, excluding respondents who provided identical answers, and filtering based on time, were essential to ensure data quality. These criteria were selected to:

    • Minimize the impact of low-effort or invalid responses

    • Ensure that the retained data came from engaged respondents who gave meaningful and thoughtful answers

    • Improve the reliability and validity of the conjoint analysis results by focusing on data from respondents who followed survey instructions and took a reasonable amount of time to complete it

By implementing these cleaning steps, we aimed to maintain the integrity of the data and ensure that the results would be based on high-quality responses.

Demographic summary

Below is a breakdown of the demographic characteristics based on the cleaned, collected survey data:

  1. Age: The most represented age group was 25-34, with 65 respondents.

Figure 3: Respondent age


  1. Gender: Male respondents were the majority, comprising 106 of the valid responses as shown in figure 4.

Figure 4: Respondent Gender


  1. Income: The income bracket “$25,000 to $49,999” was the most common, represented by 44 respondents as shown in figure 5.

Figure 5: Respondent Income Level


  1. Digital Literacy: A majority of respondents rated their digital literacy as “4,” with 73 respondents selecting this option as shown in figure 6. A higher score indicates a higher self-reported digital literacy.

Figure 6: Respondent Digital Literacy


  1. Functionality: Nearly all respondents (e.g., >90%) use their phone for texting, emails, online shopping, social media, and listening to music as shown in table 2 below.
Function n Percent of total
Text messaging 176 97
Online shopping 176 97
Email 171 94
Browsing social media 168 93
Listening to music 166 92
Voice calling 150 83
Ordering food delivery 127 70
Managing your calendar 125 69
Requesting rides 112 62
Productivity 110 61
Table 2: Respondent mobile phone use cases


  1. Race: Around two-thirds of respondents identified as white; ~20% of respondents identified as African American, and ~10% of respondents identified as having two or more races as shown in table
Race n Percent of total
White 122 67
African Amerian 38 21
Hispanic / Latino 14 8
Asian 13 7
Two or more 12 7
American Indian / Alaska Native 4 2
Native Hawaiian / Pacific Islander 1 1
Table 3: Respondent race


Modeling

In our baseline (i.e., simple) logit model, the utility \(u_j\) for alternative \(j\) is modeled as a linear combination of the attributes of the alternative. The mathematical representation of the utility function is:

\[u_j = \beta_1 x_j^{\mathrm{Price}} + \beta_2 x_j^{\mathrm{QuestionLimit}} + \beta_3 \delta_j^{\mathrm{FunctionalityMedium}} + \beta_3 \delta_j^{\mathrm{FunctionalityHigh}}+ \varepsilon_j\]

Baseline Coefficient Interpretations

  1. Price: The negative coefficient indicates that as the price of an alternative increases, its utility decreases, making it less likely to be chosen. For every one-unit increase in price, the utility decreases by 0.188 units. This reflects consumer sensitivity to cost, where higher prices reduce the attractiveness of an option.

  2. Question Limit: The positive coefficient suggests that an increase in the question limit enhances the utility of the alternative. For every additional unit increase in question limit, utility increases by 0.002 units. This implies that consumers value having a higher question limit, associating it with greater functionality or value.

  3. Functionality Medium: This positive coefficient indicates that alternatives with medium functionality provide an additional 1.517 units of utility compared to the baseline functionality (low functionality). Consumers perceive medium functionality as a meaningful improvement over low functionality.

  4. Functionality High: This coefficient, larger than the coefficient for medium functionality, indicates that alternatives with high functionality provide an additional 2.534 units of utility compared to the baseline functionality (low functionality). High functionality is even more preferred than medium functionality, and the larger coefficient reflects the strength of this preference.

The summary figure below (Figure 7) of simple logit model coefficient indicates the relative importance of each attribute in determining utility. The negative estimate for price (-0.188) shows that higher prices decrease utility significantly, while its small standard error (0.009) suggests precise estimation. The positive coefficient for question limit (0.002) reflects that higher limits increase utility, with strong statistical significance. Medium functionality (1.517) and high functionality (2.534) coefficients demonstrate that functionality is a major driver of utility, with high functionality being more impactful, and their standard errors show confidence in these estimates.

Figure 7: Summary table of simple utility coefficient estimates with standard errors


In addition to the baseline model, a mixed-logit and subgroup model was fit to the data. The mixed-logit model used log-normal distributions to model the price and question limit parameters as they are both continuous and non-negative values within the option space. The coefficient summary for all models is shown below in Table 4.

The subgroup model used two subgroups divided by respondent age, excluding participants who did not share their age. The two subgroups used for age include: Younger, 18-44 years of age; Older, 45 years of age and above. The number of subgroups and specific group thresholds were determined based on sample sizes; two groups split at the ~45-year-old threshold provides two groups that have sufficient response levels and are close to balanced. With more survey respondents, additional group granularity could be possible.

Parameter Simple Logit Sub-group Logit Mixed Logit
price -0.188 -0.180 -15.768
question_limit 0.002 0.002 -6.661
functionality_medium 1.517 1.608 1.121
functionality_high 2.534 2.687 1.790
price_Older - -0.028 -
question_limit_Older - -0.001 -
functionality_medium_Older - -0.264 -
functionality_high_Older - -0.473 -
sd_price - - -0.438
sd_question_limit - - 0.096
Table 4: Summary table of utility coefficient estimates


Willingness to pay

Figure 8 demonstrates clear and significant preferences for the evaluated attributes. The “Question Limit” attribute shows a strong positive linear trend in WTP, indicating that consumers are willing to pay more as the question limit increases. The relatively narrow confidence intervals confirm the consistency and reliability of this preference. Similarly, the “Functionality” attribute displays a clear differentiation, with WTP increasing markedly from “Low” to “High.” The distinct confidence intervals across levels highlight the importance of functionality as a driver of consumer choice. Overall, the Simple Logit Model identifies both question limit and functionality as key attributes influencing consumer preferences, with relatively low uncertainty compared to other models.

Figure 8: Simple Model of WTP for Question Limit and Functionality Attributes


Figure 9 reveals significant consumer heterogeneity and overall weak preferences for the examined attributes. For the “Question Limit” attribute, the WTP remains close to $0 across the range of attribute levels, with wide confidence intervals indicating substantial uncertainty. This suggests that the question limit is not a significant factor influencing consumer choices. For the “Functionality” attribute, WTP slightly increases as the levels progress from “Low” to “High.” However, overlapping confidence intervals imply minimal differentiation between levels, highlighting its limited impact. Figure 10 further emphasizes the substantial variability in preferences among consumers. The wide intervals reflect high uncertainty and variability in WTP, underscoring the diverse nature of consumer preferences.

Figure 9: Mixed Logit Model of WTP for Question Limit and Functionality Attributes


Figure 10: Mixed Logit Model S.D of WTP for Question Limit Attributes


Figure 11 highlights the importance of both attributes, though with slightly greater uncertainty compared to the Simple Logit Model. For the “Question Limit” attribute, WTP shows a consistent upward trend, suggesting that consumers are willing to pay more for increased limits. However, the wider confidence intervals indicate greater variability in preferences among different sub-groups. The “Functionality” attribute continues to demonstrate a positive relationship with WTP, as higher levels of functionality correspond to increased WTP. Confidence intervals for functionality levels remain distinct, emphasizing its strong influence on consumer choice. This model reinforces the relevance of both attributes while accounting for variations across sub-groups.

Figure 11: Sub-group Logit Model of WTP for Question Limit and Functionality Attributes


Figure 12 indicates a different preference pattern compared to the general models. For the “Question Limit” attribute, WTP shows a slight decline, and the confidence intervals are wide and overlap zero, suggesting that question limit is not a significant factor for this group. For the “Functionality” attribute, WTP increases across levels, indicating that higher functionality is valued. However, the confidence intervals are much wider than in other models, highlighting substantial variability in preferences within the older demographic. This model suggests that while functionality remains an important attribute, it may require tailored approaches to address the high variability in preferences among older consumers.

Figure 12: Sub-group Logit Model (Older Population) of WTP for Question Limit and Functionality Attributes


The WTP graphs use different types of visualizations because they are tailored to the nature of the data and the insights each model aims to highlight. Line plots are used for continuous attributes like “Question Limit” to show trends across a range, while bar or point plots are better suited for categorical attributes like “Functionality,” allowing for easy comparison across discrete levels. Standard deviation graphs are included in mixed logit models to emphasize variability in preferences among consumers, highlighting heterogeneity. Sub-group graphs, such as those for older populations, are used to focus on specific demographic differences. Each type of graph is chosen to clearly and effectively communicate the results based on the type of attribute, the model being analyzed, and the target audience’s need for understanding.

The results from the sub-group logit models, especially for the older population as shown in Figure 12, are unexpected and hard to interpret. For the “Question Limit” attribute, WTP slightly decreases, and the wide, overlapping confidence intervals suggest no clear preference. For the “Functionality” attribute, WTP increases across levels, but the very wide confidence intervals show a lot of uncertainty. These results might be caused by a small sample size for the older population, making the estimates less reliable. There could also be a lack of variation in the responses, making it hard for the model to find clear patterns. Additionally, older individuals may have unique preferences or biases that were not fully captured in the model. Finally, issues with data quality, like inconsistent answer, could also play a role. These unusual results suggest the need to review the data and model to better understand what caused them.

Market Simulation and Scenarios

The simulated market scenario evaluates the competitive positioning of four AI-enabled products: Apple Siri, ChatGPT, Google Gemini, and Our Product. These products are assessed based on three critical attributes: Price, Functionality, and Question Limit, reflecting key consumer decision-making factors for AI services. The input values for the simulation were carefully chosen based on official documentation, publicly available data, and reasonable assumptions where direct information was unavailable.

  • Price: Apple Siri: $0; ChatGPT: $19.99; Google Gemini: $19.99; Our Product: $15. These prices were validated against official sources to ensure accuracy ([1], [2], [3]).
  • Functionality: Apple Siri was assigned “low functionality” (functionality_medium = 0, functionality_high = 0), ChatGPT was assigned “medium functionality” (functionality_medium = 1, functionality_high = 0), and Google Gemini was assigned “high functionality” (functionality_medium = 0, functionality_high = 1). These assignments are based on the documented capabilities of each product ([1], [2], [3]).

  • Question Limits: Apple Siri was assigned a conservative 50 questions/day to reflect its free nature. ChatGPT and Our Product were assigned 250 questions/day, aligning with ChatGPT’s Plus plan. Google Gemini was assigned a higher limit of 500, reflecting its enterprise-oriented design and enhanced capabilities ([2], [3]).

Figure 13: Simulated Market Shares with 95% Confidence Intervals


Insights from Figure 13:

  • Apple Siri: Captures the highest market share (54%) due to its free pricing model, despite low functionality
  • Google Gemini: Secures 26% of the market due to its premium features, appealing to advanced users despite its higher price.
  • ChatGPT: Achieves 9% market share, reflecting its medium functionality and subscription price.
  • Our Product: Captures 11% of the market, indicating opportunities for improvement in pricing and functionality.

Opportunities for Increasing Demand Key drivers for enhancing adoption include:

  1. Price Sensitivity: Apple Siri and Google Gemini demonstrate the importance of competitive pricing. Reducing Our Product’s price to $10 could significantly boost its market share.

  2. Functionality Upgrades: Enhancing functionality from “Medium” to “High” would position Our Product more competitively against Google Gemini.

  3. Question Limit Expansion: Increasing the question limit to 500 could appeal to niche segments seeking higher usage flexibility, though its impact is less significant compared to price and functionality.

Sensitivity Analysis

Market Share vs. Price

Figure 14 highlights the sensitivity of Our Product’s market share to price changes. At $10, it captures 28% of the market, whereas at $20, market share drops below 10%, emphasizing the importance of competitive pricing.

Figure 14: Sensitivity of Market Share to Price Changes


Tornado Plot Analysis

Figure 15 provides a detailed view of how variations in price, functionality, and question limits influence market share:

  • Price: Exerts the greatest impact, with market share varying from 20% to 45%

  • Functionality (Medium to High): Results in a notable increase, ranging from 12% to 32%

  • Question Limit (250 to 500): Provides a smaller but measurable impact, ranging from 10% to 28%.

Figure 15: Tornado Plot of Sensitivity to Product Attributes

Recommendations

  1. Pricing Strategy: Reduce the price to $10 to achieve a 28% market share, making Our Product competitive with ChatGPT.

  2. Functionality Enhancement: Invest in upgrading functionality to High to directly compete with Google Gemini.

  3. Question Limit Increase: Extend the question limit to 500 to cater to heavy users and differentiate the product.

Final Recommendations and Conclusions

Key Insights

  • Price Sensitivity: Price remains the dominant factor influencing consumer choice. A price point higher than $15 results in substantial losses in market share, as shown in the sensitivity analysis.

  • Functionality: High functionality is crucial for competing with premium offerings like Google Gemini, which successfully balances price and features.

  • Question Limit: While a secondary driver, increasing the question limit could enhance perceived value among heavy users.

Recommendations

  • Pricing Strategy: Adjust the price of our product to $10 to maximize market share while remaining competitive.

  • Functionality Upgrade: Invest in upgrading functionality to High, allowing direct competition with Google Gemini in the premium segment.

  • Question Limit Enhancement: Increase the question limit to 500, differentiating the product and appealing to specific segments.

Opportunities for Growth

  • Target Premium Users: By offering high functionality at a competitive price, our product can attract users willing to pay for advanced features.

  • Bundling: Integrating additional services or partnerships can increase perceived value.

  • Marketing: Highlight affordability and advanced functionality in promotional campaigns.

Limitations and Uncertainties

  • Sample Representation: Insights are derived from a specific respondent pool, which may not fully represent diverse consumer segments. Expanding the survey could improve generalizability.

  • Market Dynamics: The analysis assumes static competitor offerings, which may change due to innovation or new product launches.

  • Attribute Scope: Attributes like integration, privacy, integration capabilities and reliability (brand trust) were not considered but could influence adoption.

  • Stated vs. Actual Preferences: Findings are based on survey data and may not fully capture real-world purchasing behavior. Future studies involving real-world trials could provide additional validation.

Appendix

Final survey

The snapshot below shows the final survey that was fielded and completed by respondents. The choice conjoint questions that do not populate follow the form described and shown above.

Mobile AI Functionality: Final Survey

Mobile AI Functionality: Final Survey

Author

Taekwon Choi, Stephen Hilton, Bryce Huffman, Vidyullatha KS, Faizan Mufti

Published

November 17, 2024

Welcome

Welcome and thank you for taking our survey. We are a student group from The George Washington University, and we are interested in what would make having an AI software service on your smartphone desirable to you.

Education information

1. Price: The amount you are willing to pay per month for the AI software service.

2. Functionality: The AI software can assist with or complete various tasks such as posting on social media, reading text messages, ordering food, making reservations, transcribing calls, and more.

The functionality of the AI is categorized into three levels: Low, Medium, and High.

  • Low functionality: Complete a task within one source (e.g., edit grammar of an email draft)

  • Medium functionality: Complete a task with up to three sources (e.g., check calendar to schedule a video call via email)

  • High functionality: Complete a task with access to all relevant sources (i.e., cross-reference calendar, email, messages, and a flight booking website to help book the best flight)

Note

If a task requires multiple sources, the AI tool will be able to seamlessly work between each.

3. Question limit: The number of questions or requests that you can ask the AI software per day (e.g., each input, such as “can you revise this email?” counts as a question)

Choice Practice

We’ll now begin the choice questions. For each question below, please choose the option you would most prefer.

*
*
*

Choice Question Intro

Great work!

We will now show you 6 sets of choice questions. Please select the option you would most prefer.

Click the next button to begin.

Question 1

*

Question 2

*

Question 3

*

Question 4

*

Question 5

*

Question 6

*

Demographic information

Awesome, thank you for completing the choice questions!

Now, please fill out the demographic questions below.

*
*
*
*
*
*
*

Other questions

Almost there! Please complete the last few questions below to finish the survey.

*
*
*
*

End Page

You’ve completed the survey. Thank you for your participation!

You may close the window now.