tocol XForm. In cases where anew IVR rendering featuredid have to be added, this rendering feature was then exter– nalized as an XForm aribute so that it could be tweaked from within the XForm. ODK Voice also automatically de– termined the set of recorded prompts necessary to render anew or modiïed form and guided recording of these prompts over the phone. To support our deploymen an outbound call scheduling system was incorporated into ODK Voice. This system auto– mated scheduling of outbound calls in certain time windows and automatic retry for unanswered calls. ODK Voice can be hosted on any server and can con– nect to regional cell networks through either avoice-over-IP (Vo IP) provider or ahardware gateway. ODK Voice speci–ïes voice dialogues using Voice XML, astandard with many client implementations. The only operating expenses for an ODK Voice instance are for the server and cellular network usage charges. Figure 2 illustrates the choices of voice and data infrastructure that can be used with ODK Voice. Figure 2: Adiagram of the hardware/software infrastructure for ODK Voice. ODK Voice uses Voice XML to specify the audio dialogues that areplayed by a Voice XML engine and transmied viaatelephone gateway to the cellular network. Col–lected data is sent to an XForms backend for view–in
rst version of the Project 4 questions including touchtone-based multiple-choice, numeric, and recorded audio answer formats. The survey was tested with asmall group of volunteers in Uganda im– mediately followed by interviews. These calls and interviews led to anumber of qualitative conclusions. Firs users who didnt receive atext message warning in advance were com– pletely unable to make sense of the survey call when it ar– rived. Furthermore, even with atext message, users tried to get the speakers aention when acall was received by saying things like Hello? Hello, missing the initial in– structions. A2-second chime sound eïect was added ahe beginning of calls to encourage users to listen, followed immediately by This is arecorded call from Project WET. You are not talking to areal person. Based on our initial tests, we estimate the success rate without text message or chime to be close to 0 Users reported that the most diï
cult part of the interface was using the keypad during aphone call; they said it would be much beer if they didnt have to use the keys. Usersalso had diïculty understanding when to speak and whento use the keypad. We modiïed the instructions to clearly tell users to either Press the 1 buon on your phone to … or Please say your answer after the beep. This improved survey , which used both touchtone and voice inpu was delivered to 20 participants. The successrate was tions correctly. 55% of participants failed to succeed at eventhe ïrst input task - pressing 1 to begin the survey - even when the instructions explicitly said to press the 1 buon to begin the survey 4 . Based on these results, we chose to switch to an entirely voice-based UI. The voice-based survey contained 3 recorded audio questions that aempted to capture information similar to the previous version. Of 70 participants who received this version of the survey, the overall complete and partial suc– cess rates are 30% total If we exclude the calls that failed due to factors external to the inter– face (hang-ups, environmental factors, wrong person thecomplete and partial success rates are total This success rate is several times higher than that of , and we saw adramatic qualitative improvement in user performance with this interface. The voice-only UI was then redesigned based on observa– tions of recorded calls from to produce . The ini– tial instructions and the question prompts were reduced in lenh, and the question prompts were rewrien to be con– versational rather than instructional, with afocus on turn– taking conventions. For example, contained the follow– ing prompt After you hear the beep, please say your name and the name of the school where you work. When you stop talkin the survey will continue. [beep] which was replacedin by What is your name? [beep was tested withonly (87% total) excluding external factors. Finally, survey was identical to except that the prompts were recorded by anative Ugandan who spoke with a Ugandan accent and dialec Of 49 participants who re– ceived this version of the survey, the overall complete andpartial success rates were cluding external factors, the complete and partial successrates were into the user failure category; every user answered at leasthad signiïcantly higher success and success /partial-success rates than both and (p &l to was not statistically siniïcant - was abandoned after asmall sample size - 4 These results may be overly pessimistic since the prompts were not recorded by anative speaker. Nonetheless, theseresults measure poorly even against the ïrst voice-only UI, which was not recorded by anative speaker. an ASR-based voice UI prototype but required signiïcantprompting and encouragement [from the experimenter] to press any key in agraphical mobile interface [13 Combining touchtone and audio input made maers even worse: once participants learned that they were supposed to enter information on the keypad, they often did not say anything during audio input questions. Based on our observations, we speculate that diï
culties with hearing and/or comprehending the touchtone instrucions, the added diïculty of moving the phone from onesear and ïnding abuon to press (possibly missing further instructions unfamiliarity with using the keypad during a phone call, and failing to recognize the automated nature of the UI, all contributed to the failure of the touchtone interface. Despite anumber of aempts at improving the touchtone interface, 55% of the participants receiving atouchtone sur– vey did not even succeed in pressing the 1 buon to begin the survey, even when they were told to Please press the 1 buon on your phone to begin the survey. Instead, they said Yes or 1 or Yes, Iam ready or simply hung up after hearing the instructions. In the cases where calls were at least carried out to completion (successfully or unsuccess–fully they typically took about 3 minutes for the voice versions) because participants had to hear each question multiple times before they would press abuon. This may have been auseful learning ex– perience for participants, but was almost certainly also a frustrating one. Finally, considering the low success rate, it is likely that even the successfully completed surveys had a low degree of accuracy. These results suest that without at least some initial trainin atouchtone interface is infeasible for this target population. We should emphasize that this work makes no claims about the usability of ASR-based UIs, which present ahost of challenges themselves such as recognition accu–racy and limited vocabulary (see e[recorded audio UIs are feasible, but how they can be further automated and scaled is not addressed here. Outbound vs. Inbound Calling Although having an IVR system call participants - ratherthan having participants initiate the call - was ïnancially advantageous, we found that it introduced additional us–ability problems, which were only partially oïset by the use of text message warnings. Firs participants were often in an environment not conducive to asuccessful IVR interac– tion. These environmental factors included loud background noise, external distractions such as conversations with third parties, and intermient call quality. Second, participants generally did not understand that they were interacting with an automated system, and tried to initiate conversation withthe system. These problems were partially oïset by the strategies described below. Automated Call Preparation and Initiation One thing that became clear from the initial testing was the importance of the text message warnin Each of the Ugan– dans interviewed cited the importance of the text message to prepare them for the call. Participants who were sent a call without receiving atext message warning were confused and would hang up after afew seconds of failed aempts to start aconversation with the recorded voice. Despite the text message warnin participants generally did not immediately realize the nature of the IVR calls; we found that no maer how we began the survey dialogue, participants repeatedly said Hello, hello? Who is this, trying to establish aconversation, and thus missed the in– structions. We found that beginning the call with asoundeïect such as achime, followed by This is arecorded call.You are not talking to areal person. eïectively captured the aention of users and compelled them to listen to the instructions. Leveraging Implicit Dialogue and Turn-Taking Con– ventions The success of the second voice interface ( ) suests that leveraging conversational and turn-taking conventions of normal conversation are much more successful than de– tailed instructions in elicting desired user behavior. In theïrst version of the voice survey, detailed instructions were provided at the beginning of the survey and questions were asked as statements (e After the beep, please say your name and the name of the school where you work In theïnal version, we asked questions as questions (e What is your name) and relied on turn-taking to signal when the user was supposed to speak. This turned out to be much more successful. Users with limited understanding of En– glish have ahard time understanding complex instructions, and the talk after the beep convention is not understood in Uganda, where voicemail is rarely used; conversely, all users know to speak when they are asked aquestion. Interestingly, in contrast to previous versions, participants were usually able to answer the questions even if they did not hear or understand the instructions due to call qual– ity or background noise, because the expected response was implicit in the conversational nature of the survey. The responses to were spoken more slowly and clearly enunciated than in previous versions. The literature reports that people tend to emulate the speaking style of their con– versational partner in avoice dialogue[15 Therefore, since the prompts were recorded more slowly and in amore un– derstandable accen the responses were also spoken more slowly and clearly 6 . Survey Design and Recording by Native Speakers Even though our survey was delivered in English, we found the use of native speakers for designing and recording prompts to be extremely importan First of all, native speakers un– derstand the vocabulary and mental model of target users. For example, we found that the phrases Press 1 to continue or Press 1 on the keypad to continue were much more dif–ïcult to understand than Press the 1 buon to continue, because users did not know Press referred to their phone, and keypad was not acommon word. Perhaps even more importantly, native speakers are able to record prompts in an accent and speaking style that 6 The responses also tended to be somewhat more concise in response to the shorter prompts, but we found that re– sponse lenh to open-ended questions was closely tied to the recording timeout (i.e. the lenh of silence before the recording ended which we tuned to 3 seconds. Essentially, users continued talking with longer and longer pauses be– tween sentences, and expected that the speaker would inter– rupt them when they had said enough. When the speakerïnally interrupted with Thank you most participants ap– peared pleased. is more understandable to users. Several Ugandans com–mented that our accent was hard for them to understand (just as their accent was hard for us to understand Fur– thermore, we instructed native prompt recorders to record the prompts as if they were speaking to someone in Uganda with apoor cell connection. This resulted in aparticular enunciation, intonation, and speaking rhythm that we could not have replicated, but which seemed to make the survey easier for users to follow. Gender Differences in IVR Task Success The gender discrepencies observed, although not conclusive,support the ïndings of Medhi et al. that women have ahigher tively, we found that (in line with Medhi) women generally listened more quietly to the instructions and answered more slowly and clearly, whereas men tended to talk during in– structions (e Hello? Hello) and more often spoke at the wrong time, did not know what question they were sup–posed to answer, or were diïcult to understand. Remote IVR Application Hosting This work demonstrates the feasibility of acloud approach to IVR application hostin In this aproach, applications are hosted in areliable cloud or remote location, are adminis– tered over the interne and connect to the countrys phone network through Vo IP. By hosting the application remotely, we eliminated the need for local hardware and technical ex–pertise, and were not aïected by power and network outages. The main disadvantage of this approach is cost: Vo IP rates to Uganda can be over 10 per minute; however, for the scale of our deploymen we found the extra usage costs l100) were much less than procuring servers and technical support in-country. We found no degradation in call quality when hosting our application in the United States and connecting to Ugandan mobile phones over Vo IP. Therefore, remote hosting may be agood alternative to local hostin especially for small-scale or prototype applications. IVR applications in the developing world have the poten– tial to extend ICT to the billions of developing-world users who own amobile phone. The most serious challenge for IVR application development in this context is usability. In this paper, we describe astudy of IVR data collec– tion UIs by untrained users in rural Uganda. Over survey calls were delivered to Ugandan teachers to collect feedback on awater education trainin These calls were analyzed both qualitatively (listening to recorded call tran– scipts) and quantitatively (measuring task success rates and informed UI changes in an iterative design process. Changesto the survey of more general design principles for IVR interfaces designed for similar populations. We see several opportunities for further study of IVR data collection interfaces with untrained users. Firs further work is required to determine if and how conversational voice input can be used by an automated IVR interface. We found that UIs based on recorded voice input (rather than DTMF) were successful for untrained users, but it is unclear if and how this input could be interpreted using ASR. Second, the accuracy of IVR-based data collection in the developing world has not yet been characterized. Patnaik et al. found that live operator data collection over voice outperformed graphical and SMS interfaces by an order of magnitude [17 but it remains unclear whether the improve– ments in data quality result from the voice modality or from the presence of alive operator. In order to answer this ques– tion, the accuracy of IVR interfaces in these environments must be determined experimentally. There has also not been suïcient characterization of the eïect of training on mobile data collection task success and accuracy. For example, Patnaik et al. observed over 95% accuracy on several UIs after hours of trainin while we found that touchtone entry failed with no trainin The tradeoï
? between training time and task success or accuracy on aparticular interface has not been examined. IVR applications in the developing world have the poten– tial to connect billions of users to previously inaccessible automated services. For this potential to be realized, there remains much work to be done to develop the technology and the design principles necessary for these applications to be usable by these unreached populations. 7. ACKNOWLEDGMENTS This work would not have been possible without the col– laboration of the Project WET Foundation, particularly John Een and Teddy Tindamanyire. We would also like to thank Bill Thies and Neal Lesh; Gaetano Borriello, Yaw Anokwa, Carl Hartung and Waylon Brunee of Open Data Kit at University of Washinon; and the Java Rosa team.  Epi Surveyor. hp:datadyne.or  Frontline SMS. hp:www.frontlinesms.com Apr. 2010.  Open Data Ki hp:codeoogle.com/p/opendataki Apr. 6] Y. Anokwa, Hartun W. Brunee, A. Lerer, and G. Borriello. Open source data collection in thedeveloping world. 2009.  M. H. Cohen, J. P. Giangola, and J. Balogh. Voice User Interface Design. Addison-Wesley, Boston,Massachuses, ïrst edition, 2004.  A. S. Grover, M. Plauche, E. Barnard, and Kuun. Hiv health information access using spoken dialogue systems: Touchtone vs. speech. In Proc. International Conference on Information and Communications Technologies and Developmen pages 9] Kabutana Trust of Zimbabwe. Freedomfone. hp:www.freedomfone.or Apr. 2010.  J. Klungsoyr, P. Wakholi, Mac Leod, A. Escudero-Pascual, and N. Lesh. Open ROSA,Java ROSA, Globally Mobile - collaborations aroundopen standards for mobile applications. In Proceedings Evaluation of IVR Data Collection UIs for Untrained Rural Users Adam Lerer Computer Science and Artiï
cial Intelligence Laboratory Massachuses Institute of Technology Cambridge, USA molly.[email protected]
projectweorg Saman Amarasinghe Computer Science and Artiï
cial Intelligence Laboratory Massachuses Institute of Technology Cambridge, ABSTRACT Due to the rapid spread of mobile phones and coverage inthe developing world, mobile phones are being increasingly used as atechnology platform for developing-world applica– tions including data collection. In order to reach the vast majority of mobile phone users without access to specialized software, applications must make use of interactive voice re– sponse (IVR) UIs. However, it is unclear whether rural users in the developing world can use such UIs without prior train–ing or improve usability for these target populations. This paper presents the results of areal-world deployment of an IVR application for collecting feedback from teach– ers in rural Uganda. Automated IVR data collection calls were delivered to over 150 teachers over aperiod of sev–eral months. Modiïcations were made to the IVR interface throughout the study period in response to user interviewsand recorded transcripts of survey calls. Signiïcant diïer–ences in task success rate were observed for diïerent inter–face designs (from participants were not able to use atouchtone or touchtone– voice hybrid interface without prior trainin Aset of design recommendations is proposed based on the performance of several tested interface designs. 1. INTRODUCTION In the past several years, there has been agrowing adop– tion of mobile phone technology as atool for solving ava– riety of challenges in international developmen including health delivery, disaster managemen microbankin sani– tation, and education. This new focus on technology is a result of the explosive growth of mobile phone usage and coverage throughout the developing world. As of 2008, therewere pared to just 60% of these Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies arenot made or distributed for proït or commercial advantage and that copiesbear this notice and the full citation on the ïrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciïc permission and/or afee.ACM , December 00. mobile phone users live in the developing world [25 Mil– lions of people, many of whom have never used acomputer and earn only acouple dollars aday, now own their own mo– bile phone; this trend is enabling awide range of potential technological solutions that were not possible adecade ago. One important use of mobile technology in the develop– ing world is data collection. Collecting data in the devel– oping world presents anumber of unique challenges: adif– fuse rural population, low literacy and education, and alackof ïnancial resources. Recently, anumber of organizations and projects have successfully used mobile phone and PDA software in place of paper-based methods for data collecion [require access to particular mobile phones running particu– lar software. This presents limitations in every application area: health reporting and advice, disaster reportin mi–croïnance, and project feedback must all be intermediated by specially trained and equipped surveyors, limiting the usefulness and scalability of these services. Expanding the reach of mobile data collection to all mobile phone users requires the use of either voice or SMS modal– ities, since these are available on nearly all mobile phones. Of these, only voice is suitable for answering an extended se– ries of questions (although SMS can be used for very simple data collection protocols Therefore, an interactive voice response (IVR) platform for rendering data collection pro– tocols is the natural choice for expanding the reach of data collection beyond customized smartphones and PDAs. In addition to expanded reach, voice-based data collection has several additional advantages. Firs using voice-based communication circumvents the serious incentive hurdles in more common, SMS-based ICTD programs (e those using Frontline SMS [3 since aphone call initiated by the survey application does not incur acost to the responden Sec– ond, there is preliminary evidence that data collected over voice in resource-poor areas may be more accurate than datacollected by either Finally, studies have shown that data collection through anautomated voice system is signiïcantly more eïective at obaining sensitive information than alive interviewer [14, 24 However, even in the best of circumstances, voice inter– faces present usability challenges such as the conventions of spoken language, limitations of speech recognition, limi– tations of human cognition and working memory, and dif– ferences between users [23 These usability problems are not want teachers to have to pay for SMS usage to providefeedback. Calling teachers with an automated voice survey circumvented this problem because mobile phone users are not charged for received calls in Uganda (and most other countries Furthermore, avoice survey could collect more detailed information than could be sent in a160-character SMS message. The purpose of the survey was to collect information from teachers about their use of the Project WET materials and trainin The survey asked whether and how the Project WET materials had been used, and what results had been observed from using the materials. Teacher names were also recorded to verify that the correct user had been contacted. The survey was delivered in English, which was spoken by all of the participating teachers. Figure 1: AProject WET teacher training in Uganda. Photo courtesy of Project WET Founda– tion. Calls were scheduled between and local time, with unanswered calls being retried up to 4 times in intervals of 2 hours. Each survey call was preceded by the following text message sent 24 hours in advance of the call. Hello! This is Project WET in the USA. Please expect arecorded survey call on [Thurs–day Help us by answering the questions. Thankyou! The ï
rst version of the survey was designed and tested by us and members of the Project WET teams. Feedback was then solicited from volunteer testers in Uganda - the supervisors of the teachers being surveyed - and used to improve the UI. Finally, calls were delivered to teachers, and several additional UI iterations were performed based on listening to recordings of those calls. The users for each UI iteration were completely non-overlappin Call recordings were all listened to in their entirety, and call outcomes were classiïed into one of the following cateories: success, partial success 1 ‚ user failure (i.e. interface failure early hangup 2 ‚ environmental factors and call qual– 1 Calls in which some of the questions were answered cor– rectly. 2 Calls in which the user hung up near the beginning of the survey instructions. It is not clear why the users hung up in these cases, but many were likely unavailable to take the call, since calls were delivered during working hours. ity 3 ‚ and wrong person. Calls that were never answered were excluded from the coun 3.2 Ethnography Since this work is based on areal-world deployment neces– sitated by the diïculty of collecting data, we cannot haveprecise ethnographic data for our participants. The follow– ing information was provided by a Project WET Coordina– tor in Uganda who works regularly with the Ugandan teach– ers. The teachers are from mostly rural schools in central and northern Uganda. The majority of teachers have English as asecond language. Some teachers speak good English but may not write good English; likewise, they may un– derstand…English spoken by the people from their localitybut ï
nd it diïcult to get the accent from other parts of the world. On the whole, primary teachers have amoderate understanding of English. Teachers undergo eleven years of education; some may have gone to college or technical trainin Most teachers have used computers during their schoolin but most do not have computers of their own. Almost all teachers, however, own amobile phone. The majority of teachers have never used an IVR interface before; even voicemail is not commonamong these users. Approximately oped for use by organizations in the developing world. ODK Voice was designed as both aplatform for creating IVR data collection interfaces suited to the needs of developing-world organizations, and as aprototyping tool for IVR interfaces. ODK Voice was developed as part of the open source Open Data Kit ( aset of open-standards-based mobile data collection appli– cations for the developing world, centred around the Open– therefore able to operate with the same forms used by other Open ROSA mobile data collection applications, and inte– grates with existing XForms data aregation and analysis tools such as ODK Aregate. ODK Voice allowed us to iterate and evaluate fully-functional IVR data collection applications very rapdily, because it achieved aseparation of concerns between protocol speciïcaion and renderin Data collection protocols were speciïed generically with the XForms speciïcation language, and ren– dering of each question was handled automatically by ODK Voice based on question type. Anumber of complex pro– tocol features could be encoded as part of the XForm with no changes to the IVR software, including multi-lingual sup– por avariety of touchtone input types and audio recordin branchin constraints, and other form logic. Customizaions speciïc to the IVR rendering of aform could be achievedwithout software modiïcations or changes to the underly– ing XForm behavior using an extensible set of rendering atributes speciïc to ODK Voice. As aresult of this design, we were able to accomplish most of our UI modiï
cations by simply modifying the pro– 3 Calls that failed because the connection was extremely poor or intermien or in which the user said Im busy, call me back later before hanging up. OV: [Intro Music] This is arecorded call from Project WET.You are not talking to areal person. This call will record your answers to three questions about your Project WET trainin After each question, you will hear this sound: [beep After this sound, say your answer. When you areïnished, stop talking and wait for the next question.
User: Since the trainin we divided the Project WET ma– terials to all schools. They are displayed in the schools and they are used for reading and for practicing in the schools.
student behavior have you noticed after using the ProjectWET materials User: More students are now cleaning their hands before eating and after eatin And they now know … Call 1: Asample call from . but we observed adramatic qualitative improvement in user performance between these two versions. Table 1 provides acomplete breakdown of call outcomes. Figure 3 compares task success rates for the three UIs quanitatively evaluated. Success interface. Women had greater task success than men. In the ï
rst voice interface, men were at least partially successful 45% of the time; women were at least partially successful 85% of the time 5 . The gender discrepancy in success rate in a Fishers exact test (p = 09 two-tailed)did not meet the standard signiïcance criterion (p &l Results are shown in Table by gender. We make no claims on the external validity of the survey methodology. Reporting bias could have been introduced if the participating teachers felt that it was in their interest to report positive results, particularly since the survey was not 5 We are only considering , because the sample size is too small in , and , and there are no user failures in. Figure by interface version. anonymous, since Project WET provided trainin materials and fundin Positive results were in fact reported from nearly all of the respondents. Most participants reported that the materials had been rolled out to students and other teachers, and that students had begun to wash their hands properly, clean water containers, etc. The aregate conclusions of the phone survey were at least in accordance with those observed directly. In approx– imately 40 of the surveyed schools, wrien feedback from teachers was elicited and direct observations of Project WET implementation and results was performed. All but one of the schools visited had been using the Project WET materi– als, and major changes such as handwashing were observed. 5. DISCUSSION This work demonstrates avariation in task success rate - from practically analysis provides several insights and suests anumber of
ticular class of users; namely, users in the developing world without prior IVR experience or trainin with real-world connectivity problems and distractions, and for whom Enlish is not aïrst language. Comparison of Touchtone and Recorded Voice Interac– tions The most serious usability problems with the initial Project WET survey involved understanding how and when to use the touchtone keys. In our initial interviews with partici– pants in Uganda, we received feedback such as It was very good, but the buons were very hard. It would be beer if you could get rid of the buons, and Pressing the buons did not work for me. Many participants did not press anykeys or did so only with signiïcant promptin and most participants who did press keys made anumber of mistakes throughout the interaction. This observation is in line with Medhi et al who found that subjects responded well to exacerbated by auser population who lacks experience us–ing voice interfaces or even other automated interfaces, and who often have alow level of education and literacy [13 The investigation of these usability challenges and their solutions in alive is the main contribution of this work. Evaluation of several IVR UIs was performed through interviews with volunteers and observation of recorded calls from over 150 survey par– ticipants, using aplatform we developed for rapid develop– ment of IVR data collection applications in the developing world. Section 2 summarizes related work on voice interfaces and evaluation of these interfaces in the developing world. Sec– tion 3 describes our study methodology, the study partici–pants, and details of our the results of our evaluation of several IVR interfaces. Sec– tion 5 discusses the outcomes of the study and proposes several general design principles for IVR interfaces targeted at these users. Section 6 provides concluding remarks and suests areas of further research. WORK There is alarge body of work on voice interfaces in the developed world. Commercial interfaces tend to focus on simple task completion, particularly for call center opera– tion. Several authors have provided guidelines for creating usable voice interfaces (e [23 with many ideasdrawn from the ïeld of computer-human interaction, such as the iterative design process, rapid prototypin and heuristic and user evaluation techniques. However, most existing IVR systems designed for the developed world target neither the needs nor the usability challenges of people in resource– and literacy-poor regions [21 Anumber of previous studies have designed and evaluated voice interfaces in the developing world for applications suchas health reference [information dissemination [ and data collection [17 Berkeleys Tamil Market projec18] was the ïrst speech interface that was rigorously evalu– ated with low-literacy users in the developing world. Devel– opers performed user studies and interviews and recordedtarget users. The study suests that there are diïerences in task success between literate and illiterate users, but the sample sizes were too small to be conclusive. Subsequent studies have evaluated IVR UI designs for illi erate or semi-literate users in the developing world. In par– ticular, several studies have compared voice and touchtone inpu with mixed results. Patel et al. found that subjects in Gujarat with less than an eighth grade education performedsigniïcantly beer using touchtone input than speech reco nition [16 and the Open Phone team also found that most of the low literacy caregivers…preferred the touchtone sys– tem [12 Sherwani et al however, found that task successin aspeech interface was signiïcantly higher than atouch– tone interface [20 and Grover et al. reported similar user performance for atouchtone and key-press replacement voiceinterface. These conïicting results show that even basic IVR UI choices are highly context-dependent and require careful consideration and study. Recent studies have also compared diï
erent types of mo– bile UIs for developing world users. One study involving health workers in Gujarat compared data collection accuracy using amobile phone electronic forms interface, an SMS data encoding scheme, and transcription via alive voice operator [17 Live operator transciption was found to be an order of magnitude more accurate than electronic forms or SMS. In a similar comparison study, the ability of low-literacy users to complete banking transactions was evaluated on atext UI,arich audiovisual rates were highest for the rich and more accurate. Users were hesitant to press buons on the phone in the rich UI and preferred avoice interaction, but they were confused in the voice UI by the inïexiblesystem responses. These studies suest that voice-based interactions may be preferable for users in the developingworld, if they can imitate human interactions suïciently. Almost all previous IVR evaluations have provided train– ing to participants, and several have cited eïective trainingas crucial for task success with speech interfaces. Sherwani et al. and Patnaik et al. had participants complete aseries of instructor-guided practice tasks to learn the interface [20, 17 Patel et al. and Medhi et al. provided participants with averbal explanation of the system before evaluation [16, 13 Grover et al. had each user watch aïve minute videoshowing how acaregiver could use the system [8 Only the Tamil Market project reported user success without train–ing on their Wizard-of-Oz that even inexperienced users were successful because they provided information even when no input [was] given. This strategy is not suitable for data collection applications. In contrast with previous work, our work examines what IVR interactions are possible without trainin in areal– world environmen The most promising aributes of IVR applications in the developing world are their reach and scal– ability, which are hampered by adependence on prior user trainin Survey/census applications such as ours would be rendered pointless by adependence on prior live trainin and citizen applications such as health information, agri– cultural information, and citizen reporting would ideally be spread virally without aneed for individual training of each user. Furthermore, we do not predict arise in IVR-savvy in the developing world obviating the current need for IVR trainin if anythin IVR-savvy will only improve after IVR interfaces are developed that can be used without training and thus reach untrained users. Therefore, we have atempted to elucidate if and how IVR interfaces can be usedsuccessfully without prior trainin t organiza– tion whose mission is to reach children, parents, educa– tors, and communities of the world with water education. [5 Project WET conducted ateacher training program throughout rural Northern Uganda in July and August 2009. Teachers were trained and given materials to teach students proper sanitation and personal hygiene practices. The Project WET organizers were interested in obtaining feedback from participating teachers about if and how they had used the Project WET materials in their schools and communities. The teachers were located throughout rural Uganda and were diïcult to reach in person, but approxi– mately 250 of the teachers provided mobile phone numbers at which they could be reached. Project WET originally planned to collect feedback with an SMS campaign, but did