下手の横好き世界5 by William Van Hecke

Siri Teaches Interaction Design

The green-background shots in this post, with T-Mobile in the status bar, are from my good friend Jon Bell, who has been kindly sending me his own Siri examples as I worked on this post. No wonder I didn’t remember asking some of these questions!

I have been collecting Siri screenshots for a while, mainly because they were disappointing in an amusing or unexpected way. I guess it is fascinating to see how a sophisticated system fails. Over time I realized that many of these incidents could be used to teach about interaction design.

Note well that the point of this post is not to denigrate the hard work of Apple’s software creators. It’s to use the rich grounds of Siri interactions as a starting point for conversations about design.

Okay here we go~

Fuzzy Matching

IMG 1664 Ruffin

It’s really surprising to me when Siri correctly gets most but not all of a proper name, and then fails to make the connection to the actual thing sitting in my music library. If five of the words in a six-word band name match, and the non-matching one sounds very similar to the real one, isn’t it a safe bet that that is the band I meant? Especially if the alternative is to give up and say you couldn’t find it? This makes me wonder whether it’s better to be “safe” and play nothing, than to guess and be wrong sometimes. But I think it’s better to try playing something. It’s beyond me why, in the case of “Rough & Laugh”, she decided to opt for the instrumental — I would have expected the most direct match possible.

Wild Guesses

IMG 2086 IMG 2119 IMG_2349

Here’s the opposite of the previous problem. Wild guesses form most of my disappointing interactions with Siri. I ask a pretty reasonable question; Siri disregards some of the crucial semantic content of my request and returns a pedestrian search result about the remaining semantic content.

This is surprising because it seems like Siri is ignoring what the user cares about in favor of what Siri cares about. The commercials show celebrities chatting with Siri as if she is a person, and the conversational interface invites people to anthropomorphize her. Indeed, most people automatically started referring to Siri as “she” (or “he” in regions where the voice is male by default), even though Apple calls Siri an “it”. Voices seem human, and people have a built-in animism that makes them want to attribute intent and intelligence to everything. But the danger is that the more you portray your system as smart and human-like, the more you encourage that illusion — and the more disillusionment the user is in for when the system turns out to be just a computer after all.

Names Are Important

IMG 2087

I tried to send a text message about my friend Rachael. (You can see it behind the “Canceled” overlay.) But there are a couple of different ways to spell that name, and Siri understandably guessed the wrong one. That’s not a big deal, but it’s really surprising to me that I couldn’t tell her to correct the spelling. My only option really seemed to be to either disrespect my friend by sending a text with her name misspelled, or to give up and type it myself. When it comes to something as personal as people’s names, technology should do its best to be respectful, or at least not to be insulting.

Pronunciation of names is at least as important as spelling, and I am super impressed at Siri’s interface for learning to pronounce people’s names. You can tell that Apple is trying very hard in that area… but they could do a better job of exposing it; you have to ask in juuuust the right way or you’ll get a result like this:

IMG 2278

That has the danger of making a user think the feature doesn’t exist at all, which means that as far as that user is concerned, you might as well not have done all that hard work in the first place! This is a great place to err on the side of exposing features and making sure users know how great your product can be.

Conversations Have State

IMG 2088

Another danger of presenting Siri as a conversational interface is that people expect normal human conversations to preserve contextual information from one utterance to the next. If you mention a noun, I can then use a pronoun to refer to that noun in my response.

Siri, though, has no idea what the antecedent of “there” is in this exchange, even though the last thing she mentioned to me was a place. Furthermore, she seems to want to try matching the word “there” against my contact list, not realizing that it’s a pronoun rather than a proper name.

Prioritize Relevant Results

When I asked Siri about Rachael’s house, she admirably recognized that I could mean Rachael S or Rachel T.

IMG 2090

What’s odd, though, is that only one of those people has an address in my contact list — so there’s no value in offering to tell me about the other one. It seems like it would have been a much better experience for Siri to have told me, “Here’s the home address for Rachael S”. That would have let me skip a needless step if I had meant Rachael S, while also warning me in case I had meant Rachel T.

When I asked again later, “How do I get to Rachael’s”, Siri seemed not to realize that I was using the colloquial “Person’s” as short for “Person’s house”, and instead prioritized a business search. Maybe because businesses often use that same colloquial construction, but the very reason they use it is to evoke the familiarity of a friend’s home!

IMG 2093

What would have been super helpful is for Siri to recognize that the last person I had been texting with was Rachael S, or even the overall frequency that I tend to communicate with Rachael, thus tipping the algorithm strongly in her favor. I am not sure if Siri has such probabilistic weighting going on in the background, but I hope so, and I hope they keep improving it.

Your Data Had Better Be Good

IMG 2091

This one is mainly about how Apple Maps failed to disambiguate two database entries which clearly refer to the same real-world business. That’s a little annoying.

But when I got directions to this place, Apple Maps’ turn-by-turn directions told me I had arrived when I was actually 13 blocks away from my destination! People increasingly trust technology to help them plan their day and get things done; when that trust is violated in a way that has ramifications in the real world, like getting lost on Queen Anne Hill, it feels a lot more real than when your word processor doesn’t work right. This was the day I put the Google Maps app on my home screen. :P

Context is Important

IMG 2092

When I asked Siri to read “these” text messages, I hoped that she would be able to recognize that I was currently looking at a screen full of texts from Liz. Instead, she came back with the non-sequitur that I had a text from Peter.

Siri is presented as an integral part of the operating system in that she’s available immediately from any screen. And the Siri interface seems to happen in context of what you’re doing — rather than being presented as a different place that you go to when you hold the Home button, the Siri UI appears as a transparent overlay on top of wherever you already are. So it makes sense that users would expect Siri to have some idea of what is happening on their screen at the moment.

Avoid Embarrassing Results

IMG 2094

Right when this meme was blowing up, I thought I’d see whether Siri knew about it just yet. As someone who doesn’t ever swear, I was pretty taken aback by how she interpreted my question. It seems that a technology that tries to prioritize respect should try very hard not to put words in a user’s mouth, especially not vulgar ones!

It’s clear that Siri knows this is a rude word, so it’s astounding to me that she’d rather interpret me that way and respond with unhelpful snark than find the next best polite match that might yield real results. I was mildly chagrined by it, but you could imagine plenty of people who would feel downright offended.

It doesn’t help that the correct interpretation of my question yielded another example of a wild guess using Siri’s Wikipedia search, when an ordinary web search would have yielded the correct result.

Eventually Apple did teach Siri to respond to this question with a random goofy noise from the video.

Complete Non-Sequiturs

IMG 2095

This starts out as a “Names Are Important” example, in that Siri can’t ever seem to recognize that “Ann” is a name — the name of a very dear friend whom I communicate with all the time. Instead, she always seems to hear “an” or “and”, with accordingly confused results.

For the life of me I can’t make out what Siri thought I was trying to get at here, that she would respond with a non-sequitur about messages.

IMG 0005

I… was just making a Star Trek joke…?

Don’t Expose Underlying Mechanisms

IMG 2096

I googled the title of this section to find out where I got it, and the first result was my own book. So I’m taking credit for it! As far as I can tell, Siri is being unnecessarily pedantic here by saying that Ann doesn’t have a home address, because the sole address for her is not tagged as “home”. But really, if there is only one address, it’s safe to assume that it is her house. Just use it! Don’t bother me with complaints that you’re not suuuuure whether it’s her house or not.

Don’t Discard Meaningful Input

IMG 2106

I told you when! Making users repeat themselves is a great way to make them quit relying on your input method.

Take the Cheap Wins

IMG 0117 IMG 2212 IMG 0118

You would think that one of the easiest semantic mappings to make would be that five-star songs (or four- and five-star songs) are considered the user’s “favorites”. What a cheap win it would be for Siri to understand this request, and serve up happy-making tunes at your command! For relatively little effort (mapping “favorite” or “that I like” to “five-star”), the system could provide quite a lot of happiness. Instead, she plays a minor ‘90s j-pop group that happens to have “Favorite” in their name, or a RHCP song I didn’t even know I had…

Fine, if Siri doesn’t understand what I mean by “favorite”, surely she understands “five-star”? Nope, if there isn’t a playlist by that exact name, the system pretends not to know what I’m talking about. Really, this is another case of “Don’t Expose Underlying Mechanisms”. Don’t let on that the only way you know of to organize music is saved playlists; take what I said and make it happen!

Provide Help at the Point of Need

IMG 0141 IMG 0142

“At the Point of Need” is a Tufte phrase we use a lot at work. If you have a good idea of when someone is likely to need something, you should do your best to provide it there, rather than making them hunt for it. Isn’t Siri a perfect interface for requesting a volume change? Considering that Siri is a strictly audio-based interface? Alas, you have to use the traditional volume-adjustment mechanisms.

Why Don’t You Do It For Me?

IMG 0145

Any time a computer gives me specific interaction instructions for how to do something, I want to say, “You’re the computer; if you know what needs to happen, why don’t you do it for me?” Thankfully, in this case Siri provides a button to get me partway to where I need to be to make this work. Good!

Don’t Casually Discard State

IMG 2125

“State” is all of the information about what is going on in the system. In this case, it’s the precise moment I left off at in the album I was listening to when I got out of the car, which I expected to be able to return to when I got back in. Instead of picking up where I left off, in this situation Siri cavalierly decided it would be fine to discard that state and just shuffle everything I own. Later I learned that I should have said “resume music”; I wish the system had double-checked with me when I’d used such an ambiguous word as “play”.

Interlude: Impressive Fuzzy Match

IMG 2165

Nicely done, Siri! Let’s rock out to the Mars Volta together.

Present Information In Context

IMG 2210

After a desgn class, some friends and I wanted to go see Her, a movie about an AI with a voice interface! Awesomely, Siri kinda failed at helping us go see it by presenting the showtimes while needlessly hiding the current time (ordinarily visible in the status bar). Movie showtimes only make sense in context of what time it is right now!


I’ve left out a lot of goofy, amusing, or dumbfounding screenshots in the interest of getting this post out there. But it’s already been almost a year since I started collecting these, so I think it’s about time I hit Post. I hope you found some useful, real-world hooks for why you should take users’ needs seriously. Siri is some impressive technology, and I am sure she will continue improving to match the expectations that people have of a human-seeming interface.