When
design leads to friendship, and that friendship leads back to design,
magic happens. This is the story of how an intern and her mentor
designed Apple’s original emoji set and together changed the way people
communicate around the world. It was also a project that led them to
become lifelong friends, a key ingredient in the success of these tiny
icons. In a nutshell, I was the intern and Raymond is my lifelong friend
and mentor. In the course of three months, together we created some of
the most widely used emoji: face with tears of joy, pile of poo, red
heart, and party popper, plus around 460 additional ones. Later, as a
full time Apple employee, I even got to create a few more.
It
was the summer of 2008, and I was one year away from receiving my MFA
in Graphic Design from the Rhode Island School of Design (RISD). It was
the same summer I landed an internship at Apple on a team I was eager to
meet. The same design team responsible for the iPhone; a magical device
that launched the year prior at Macworld Expo in San Francisco. One
could only imagine the size of my butterflies as I flew to Cupertino and
arrived at 1 Infinite Loop. To add to the uncontrollable fluttering, I
had no idea what project I would be given, the size of the team, where I
would sit, or if I could really bike to work (I’m terrible on bikes).
Soon
after my arrival and meeting the team (oh and biking to work!) I was
handed my project. I was still trying to make sense of the assignment
I’d just received when someone asked if I knew what an emoji was. And
well, I didn’t, and at the time, neither did the majority of the English
speaking world. I answered ‘no’. This would all change, of course, as
the iPhone would soon popularize them globally by offering an emoji
keyboard. Moments later I learned what this Japanese word meant and that
I was to draw hundreds of them. Just as I was looking down the hallway
and internally processing, “This isn’t type or an exercise in layout,
these are luscious illustrations,” I was assigned my mentor.
For
the next three months Raymond and I would share an office and
illustrated an array of faces, places, flags, animals, food, clothing,
symbols, holidays, sports, and well, you probably know the rest. But
long before any of this was complete, I had to learn how to design
Apple-styled icons. We split the batch and the lesson in humility and
craftsmanship began.
Raymond
designed the face with tears of joy and pile of poo and I designed the
red heart and party popper. Emoji examples are from Emojipedia.
Raymond
taught me everything there was to know about icon design. Little did I
know that he, my humble mentor, was one of the best icon designers in
the world. In other words, I sat next to one of the best iconographers,
got to pick his brain until I could kick off my training wheels, all the
while exchanging stories of our time growing up in South Florida
including our trips to ‘Pollo Tropical’ in the search of amazing
plantains. Lesson in humility, check.
My
first emoji was the engagement ring, and I chose it because it had
challenging textures like metal and a faceted gem, tricky to render for a
beginner. The metal ring alone took me an entire day. Pretty soon,
however, I could do two a day, then three, and so forth. Regardless of
how fast I could crank one out, I constantly checked the details: the
direction of the woodgrain, how freckles appeared on apples and
eggplants, how leaf veins ran on a hibiscus, how leather was stitched on
a football, the details were neverending. I tried really hard to
capture all this in every pixel, zooming in and zooming out, because
every detail mattered. And for three months I stared at hundreds of
emoji on my screen. Somewhere in there we also had our first Steve Jobs
review, which had created a shared experience of suspense and success
when they were approved for launch. And if Steve said it was good to go,
I’d say lesson in craftsmanship, check.
Sometimes our emoji turned out more comical than intended and some have a backstory. For example, Raymond reused his happy poop swirl as the top of the ice cream cone. Now that you know, bet you’ll never forget. No one else who discovered this little detail did either.
Another
example is the order in which we drew them. We left the tough ones to
last, so the dancer with the red dress emerged towards the end of my
internship as it was the one that kept getting punted. You can thank her
ruffled dress for that and Raymond for the final output. The woman’s
turquoise dress with the brown waist band, on the other hand, was one I
drew earlier in the process. It was inspired by the color palette and
proportions of a dress that my sister had created in real life that same
year.
So
from funny backstories to realizing he and I attended high schools less
than 30 miles apart, our shared past and days drawing together
triggered unstoppable laughing spells with watery eyes and all, in other
words, with tears of joy. Ten years after my internship, Raymond and I
still fill a room with laughter and he continues to provide me with the
most flat out honest feedback to keep me in check, and vice versa. All
this is what I believe made the emoji successful, our friendship through
design.
When
I spotted the ever-changing rock found at Bernal Heights Park in San
Francisco, both Raymond and I had to pay tribute to the magical pile of
poo. Photo taken in 2016.
This
year will mark the tenth anniversary of Apple’s original emoji launch.
They were first released in Japan on November of 2008, shortly after my
internship at Apple concluded. I had no idea that within a few months of
completing such project, it would revolutionize our culture’s way of
communicating or how the emoji would physically appear everywhere. And I
mean everywhere: toys, apparel, stickers, candy, music videos, books,
jewelry, landmarks, movies, and whatever else you’ve seen.
It
should be noted that although Raymond and I, Angela Guzman, are the
original Apple emoji designers responsible for the initial batch of
close to 500 characters (and were awarded a US patent for them), there
are of course additional Apple designers. Amongst them, Ollie Wagner
created around a dozen of the original set after the conclusion of my
internship, and many more the following year. The set now totals
somewhere in the thousands — some are even animated!
Ten years ago Raymond and I worked on one of my favorite projects to date, one that led me to experience my own ikigai.
This Japanese term is defined as the place where one’s passion,
mission, vocation, and profession intersect; what some would say the
reason to get up in the morning — literally me in 2008. I would eagerly
wake up, and on the days I had to bike, I’d carry my bike down three
long flights of stairs and head to work with a smile on my face. Now
that’s magic!
Gift
from Raymond upon the conclusion of my internship, his version of real
life emoji. The orange, apple, and eggplant were part of a set that I
had created.
On
that note, I would suggest to any designer looking for their reason to
get up in the morning to find their humble mentor, or be one, and get on
the road to friendship. Because magic happens when design leads to
friendship, and that friendship leads back to design. For every emoji
made, I learned something new. For every emoji made, Raymond and I
became better friends. The better friends we became, the better designer
I became. In this case, friendship and design happened one emoji at a
time. And that’s a story worth sharing.
Sufficient
visual hierarchy is a foundation of a successful digital product. It
helps to organize UI elements in an effective way so that content would
be easy to comprehend and pleasant to see. The presentation of visual
elements has a great impact on user experience. If the components are
organized wisely, users navigate and interact with a product without
efforts and enjoy the process.
So,
what makes powerful visual hierarchy? Of course, different kinds of
products require different methods of building it still there are some
common solutions helpful for UI content organization. Today’s article
provides useful tips on creating the compelling visual hierarchy for web
and mobile products.
Keep business goals in mind
There
are often business goals standing behind a digital product. To achieve
them, creative team needs to work out which UI elements are more
important and prioritize them according to their roles. For example, all
the elements on e-commerce websites perform the tasks of various
levels. The item images are usually the main eye-catchers since they
have to encourage customers to consider it. A heading goes after the
image explaining what it is and the next important stage is a CTA button
calling people to buy an item. By considering business and marketing
goals set for the website or app, the creative team can effectively
prioritize visual content and make a product stand out the crowd.
In our previous articles,
we mentioned that before reading a web page people scan it to get a
sense of whether they are interested. Different studies, including the
ones by Nielsen Norman Group, have revealed several popular scanning patterns among which “F” and “Z”-shaped.
F-pattern appears mainly on digital pages or screens with the big amount of content such as blogs, news platforms etc.
Users’ eyes move in F-shape: first, they scan a horizontal line on the
top of the screen, then move down the page a bit and read across the
shorter horizontal line, finishing with the vertical line down on the
left side of the copy where people look for keywords in the paragraphs’
initial sentences.
Z-shaped
pattern takes place on the pages which are not so heavily concentrated
on copy or those which don’t require scrolling down. The pattern is the
following: people first scan across the head of the page starting from
the top left corner, searching for core information, and then go down to
the opposite corner at a diagonal, finishing with the horizontal line
at the bottom of the page from its left to right.
Knowing
these patterns designers organize content putting all the core UI
elements on the most scanned spots to draw users’ attention.
Functionality first
The
visual hierarchy may seem to be oriented only to the aesthetic aspects
but it’s not like that. First of all, by structuring and organizing
visual elements designers need to make sure a product is clear to use
and the navigation works right. The visual hierarchy which is built
exceptionally on aesthetics can’t work effectively. User interface with
the badly structured content leads to the bad UX. So, while building
visual hierarchy designers need to consider functions of UI elements and
a role they play in the navigation process.
White
space, or negative space, is not just an area between design elements,
it is actually a core component of each visual composition. It is a tool
able to make all the user interface elements noticeable to users’ eyes.
Designers can group or separate UI components so that they could create
the effective layout. Moreover, negative space helps to emphasize
particular elements which require deep attention from users. White space
is an effective instrument for creating visual hierarchy so designers
need to work on its balanced usage.
We devoted one of our latest articles to the golden ratio
applied in design. It is a mathematical proportion of the elements of
different sizes which is thought to be the most aesthetically pleasing
for human eyes. The proportion equals 1:1.618 and it is often
illustrated with seashell-shaped spirals which many of you could have
probably seen.
Designers
often apply golden ratio at the stage of wireframing. It helps to plan a
structure for the layout placing and sizing user interface elements in
the right proportion which will be pleasant for users.
A
grid is one of the key tools applied at the different stages of the
creative process and visual hierarchy is not an exception. A grid helps
to structure all the components and put them into the appropriate sizes
and proportion. What’s more, designers can effectively work with the
negative space since a grid shows if the elements are placed
proportionally and even.
Add some colors
Color
choice and combinations are essential for visual hierarchy as they help
users to distinguish the core elements. The thing is that colors have
their own hierarchy which is defined by the power of influence on users’
mind. There are bold colors such as red and orange as well as the weak
ones like white and cream. Bold colors are easy to notice so designers
often use them as the means of highlighting or setting contrast.
Moreover,
applying one color to the several elements you can show that they are
somehow connected. For example, you can choose a red color for purchase
buttons so that people could intuitively find them when they need.
Visual
hierarchy includes a core subsection called typographic hierarchy which
aims at modifying and combining fonts to build the contrast between the
most meaningful and prominent copy elements which should be noticed
first and ordinary text information. The fonts can be transformed by
regulating sizes, colors, and families as well as their alignment.
Different fonts can divide copy content into different levels so that
users could perceive the information gradually. However, designers are
recommended to keep the number of fonts within three since too many
fonts look messy and make the design inconsistent.
Three levels for web, two for mobile
As
we mentioned above, different fonts form typographic levels which
consist of such elements as headlines, subheaders, body copy,
call-to-action elements, and captions. There are three typographic
levels: primary, secondary, and tertiary. The first one includes the
biggest type and aims at drawing people’s attention to the core
information on the screen. The next level provides copy elements which
are easily scanned and help users navigate through the content. The
tertiary level usually applies body text and some additional data which
is presented via relatively small type.
In
many cases, web products include all three levels since they are more
likely to provide the big amount of content. On the other hand,
designers are recommended to keep the number of layers within two while
creating typography for mobile. The small screens don’t provide enough
space for three levels so the elements of a secondary level such as
subheaders have to step aside to make mobile UI look clean.
Effective
visual hierarchy is not only about aesthetics. It aims at providing
problem-solving navigation and interaction systems as well as friendly
user experience. To create a sufficient visual hierarchy, designers need
to organize all UI elements considering the functionality and business
goals.
In
this article, I’d like to talk about the “divide in technology” and how
you can become proficient at solving tech problems even if you have
never done it before.
The Gap
There
is a fundamental divide in how people deal with tech problems. It seems
that some people see computers, smartphones and other technical devices
as “black boxes”, most of the time doing what they want, but at times
showing frustrating errors or just plainly stopping to work.
Others
(with a winking eye referred to as “tech people”) see those devices as a
system of parts: hardware, software and things that run on the
internet. While errors and failures are certainly annoying, they are
merely symptoms that some part of the system is malfunctioning. And since it’s technology, the various components can be fixed.
The
difference between those groups is that the first group is intimidated
by technology — you might hear someone say “Oh, he (the computer)
doesn’t like me”, as if it’s a personal thing and the technological
system can be blamed. The other group doesn’t put the blame on the
system as a whole, vicious entity, but instead treat it as it is: a
collection of parts.
It’s
no shame to belong to group one, after all, technological education and
systems thinking is rarely taught and if you never had someone else
introduce you to the topic, you were likely never exposed to the ideas
behind it. However, I encourage you to read on and discover it’s quite
easy to understand and to switch over to the “tech” side in no time.
Why
should you do this? Because it gives you power and control over the
things you own. You are absolutely capable of fixing and repairing both
software and hardware problems, once you understand the basics. And each
time you succeed in fixing something, you will gain confidence and
experience. Plus, it’s actually pretty fun.
Everything is just a collection of parts
As
mentioned in the intro, every piece of technology is a quite elaborate
collection of parts, divided into hardware and software. The hardware is
the actual thing that you carry around, most of the times small boards
or chips that fulfill a certain function.
Two good things: those components are similar on almost all systems (I’m talking about computers, tablets and smartphones).
They
all have a processor unit (doing the computations), a permanent storage
(where all your photos are for instance) and a temporary storage
(supplying the files that are in use right at the moment to the
processor).
Those
three are absolutely necessary for the basic functions. Then, of
course, you have things that support everything else: batteries,
screens, sensors, input devices (keyboards, trackpads), wireless chips
and a series of boards connecting everything together.
The
second good thing is that you don’t need to understand how each of
those components work (or even how the system works at all) and you can
still fix the system as a whole.
On
top of this, there is software: an operating system and applications
running on this system. Again, you don’t need to understand how this all
works, just be aware of its existence.
Have you tried turning it off and on again?
It
seems like a tired old joke, but it’s quite true. More than half of all
errors on almost all systems can be “fixed” by turning off the system
and restarting it.
This allows the system to begin with a blank slate, it reloads the software and starts all calculations afresh.
It
is truly the one thing that a “tech person” will do first when trying
to fix a problem. Switch everything off (completely, ideally also
disconnect the power), then back on. You will be surprised how many
errors are never showing up again! This technique can be adapted to
resetting and reinstalling software, but we’ll get into that in a later
article.
Turn it off. Turn it back on. Fix most of your errors.
You can’t really break something
I find that most of the time, people are not trying to fix things because they are afraid to damage those things permanently.
Another
good thing (this article is full of positivity): you can’t really break
something as long as you don’t physically break part of the system.
Keeping your technology dry and reasonably clean is a good way to start.
It’s
also quite unlikely that you damage your software beyond repair. Rest
assured that there is almost always a way to completely reset
everything. Which brings us to the next point and then we will go into
the details.
Store your files securely
As
mentioned above, your files are stored on the permanent storage (hard
drive) of your device. Luckily, in the last decade it has gotten
incredibly easy to also store all your files in the “cloud”, meaning a
separate computer somewhere on the internet, owned by a company.
The
most famous of these services, like Dropbox, iCloud, GoogleDrive and
OneDrive are reliable and widely used, while alternative might be suited
to special needs.
I
won’t go into any detail on how to choose the best service, you should
be fine with typing “best cloud storage providers 2018” into Google.
The
point is: while I said you can’t break anything on your system, you
might lose your files, programs, settings and achievements if you don’t
save them on another device first.
Use
a cloud storage, external hard drive or another computer to move
important documents out of your system for the time of the repairs.
Things change, for better or for worse, it’s never just you
You
know the saying: never change a running system. Many software
developers don’t seem to heed this, they are constantly updating,
improving, iterating and changing.
Most
of these changes are benign, while sometimes they break the very thing
that you rely on for your work. It is annoying, it costs energy and
time.
Yet, we all have to accept it, sort of the price we pay for getting accelerated technological progress.
And
despite the myriads of different technological configurations,
operating systems, smartphones and programs there are, there is a high
chance that someone, somewhere has already had the same problem and
found a solution and shared it with the world.
Which brings us to…
The big secret
This is the big one. The secret you have been waiting for. How do “tech people” actually fix things?
The answer, of course, is a simple process.
They google the error and then follow whatever other people have tried.
Yes,
that’s all. That is how most of the errors get solved and how most
things get repaired and in fact, how most things are learned.
You
just google what you are trying to do and then spend some time going
through the answers. It might not be the first answer that helps you,
but chances are that somewhere in the first five answers, something
will.
The
art is within the right phrasing of the question. I’ll walk you through
an example: recently, my 3D software “Blender” started to display black
boxes instead of the usual interface. It was mildly annoying, so I
tried to fix it.
Here
is how you construct the google query: type the program name first,
then add a short and succinct description of what’s wrong. For instance:
“blender 3d displaying black user interface”. Here is what Google gives
me:
Click on the first answer.
And
I simply go to the first answer, which is a site called
stackexchange.com. It is a platform/ community where lots of tech
questions are answered and it is quite trustworthy. Reading the question
that someone else asked, I think that they have the same issue. And
behold, below there is an answer.
Turn off nVidia shadowplay, thanks J. Larsen!
I know that shadowplay is a program for my graphics card, so I turned it off.
It fixed the issue, no more black boxes.
If
I didn’t know how to turn it off, guess what: I’d google it (“turning
off nvidia shadowplay”). There are tutorials for everything online.
This principle works with any error message, too.
Just
don’t click it away angrily, look at it, read it and if you don’t
understand it, copy the exact words into Google, combined with the
software from which it came, for instance “windows 10 error 0x80200056”.
It looks like gibberish and I have no clue what it means, but other
people do!
Put it into Google, read the first answer (like seriously, read it like a really good recipe) and follow it.
Remember, you are quite unlikely to break anything, so just follow the steps.
Yes,
there is a chance that your problem is absolutely rare and unique. It
happens to all of us. We live with it. We reinstall the whole system. We
buy a new computer. But we can always say that we tried.
I’ll probably go into a little more depth on this next week, but for now, you have a basic understanding of your tech!
The more you fix and try and change, the more confident you will become.
Voice is the future. The world’s technology giants are clamoring for vital market share, with ComScore projecting that “50% of all searches will be voice searches by 2020.”
However,
the historical antecedents that have led us to this point are as
essential as they are surprising. Within this report, we take a trip
through the history of speech recognition technology, before providing a
comprehensive overview of the current landscape and the tips that all
marketers need to bear in mind to prepare for the future.
The History of Speech Recognition Technology
Speech
recognition technology entered the public consciousness rather
recently, with the glossy launch events from the tech giants making
worldwide headlines.
The appeal is instinctive; we are fascinated by machines that can understand us.
From
an anthropological standpoint, we developed the spoken word long in
advance of its written counterpart and we can speak 150 words per
minute, compared with the paltry 40 words the average person can type in
60 seconds.
In
fact, communicating with technological devices via voice has become so
popular and natural that we may be justified in wondering why the
world’s richest companies are only bringing these services to us now.
The
history of the technology reveals that speech recognition is far from a
new preoccupation, even if the pace of development has not always
matched the level of interest in the topic. As we can see below, major
breakthroughs dating back to the 18th century have provided the platform
for the digital assistants we all know today.
Created by author
The
earliest advances in speech recognition focused mainly on the creation
of vowel sounds, as the basis of a system that might also learn to
interpret phonemes (the building blocks of speech) from nearby
interlocutors.
These
inventors were hampered by the technological context in which they
lived, with only basic means at their disposal to invent a talking
machine. Nonetheless, they provide important background to more recent
innovations.
Dictation
machines, pioneered by Thomas Edison in the late 19th century, were
capable of recording speech and grew in popularity among doctors and
secretaries with a lot of notes to take on a daily basis.
However,
it was not until the 1950s that this line of inquiry would lead to
genuine speech recognition. Up to this point, we see attempts at speech
creation and recording, but not yet interpretation.
Audrey,
a machine created by Bell Labs, could understand the digits 0–9, with a
90% accuracy rate. Interestingly, this accuracy level was only recorded
when its inventor spoke; it hovered between 70% and 80% when other
people spoke to Audrey.
This
hints at some of the persistent challenges of speech recognition; each
individual has a different voice and spoken language can be very
inconsistent. Unlike text, which has a much greater level of
standardization, the spoken word varies greatly based on regional
dialects, speed, emphasis, even social class and gender. Therefore,
scaling any speech recognition system has always been a significant
obstacle.
Alexander
Waibel, who worked on Harpy, a machine developed at Carnegie Mellon
University that could understand over 1,000 words, built on this point:
“So
you have things like ‘euthanasia’, which could be ‘youth in Asia’. Or
if you say ‘Give me a new display’ it could be understood as ‘give me a
nudist play’.”
Until
the 1990s, even the most successful systems were based on template
matching, where sound waves would be translated into a set of numbers
and stored. These would then be triggered when an identical sound was
spoken into the machine. Of course, this meant that one would have to
speak very clearly, slowly, and in an environment with no background
noise to have a good chance of the sounds being recognized.
IBM
Tangora, released in the mid-1980s and named after Albert Tangora, then
the world’s fastest typist, could adjust to the speaker’s voice. It
still required slow, clear speech and no background noise, but its use
of hidden Markov models allowed for increased flexibility through data
clustering and the prediction of upcoming phonemes based on recent
patterns.
Although
it required 20 minutes of training data (in the form of recorded
speech) from each user, Tangora could recognize up to 20,000 English
words and some full sentences.
The
seeds are sown here for voice recognition, one of the most significant
and essential developments in this field. It was a long-established
truism that speech recognition could only succeed by adapting to each
person’s unique way of communicating, but arriving at this breakthrough
has been much easier said than done.
It
was only in 1997 that the world’s first “continuous speech recognizer”
(ie. one no longer had to pause between each word) was released, in the
form of Dragon’s NaturallySpeaking software. Capable of understanding
100 words per minute, it is still in use today (albeit in an upgraded
form) and is favored by doctors for notation purposes.
Machine
learning, as in so many fields of scientific discovery, has provided
the majority of speech recognition breakthroughs in this century. Google
combined the latest technology with the power of cloud-based computing
to share data and improve the accuracy of machine learning algorithms.
This culminated in the launch of the Google Voice Search app for iPhone in 2008.
Driven
by huge volumes of training data, the Voice Search app showed
remarkable improvements on the accuracy levels of previous speech
recognition technologies. Google built on this to introduce elements of
personalization into its voice search results, and used this data to
develop its Hummingbird algorithm, arriving at a much more nuanced
understanding of language in use. These strands have been tied together
in the Google Assistant, which is now resident on almost 50% of all
smartphones.
It
was Siri, Apple’s entry into the voice recognition market, that first
captured the public’s imagination, however. As the result of decades of
research, this AI-powered digital assistant brought a touch of humanity
to the sterile world of speech recognition.
After
Siri, Microsoft launched Cortana, Amazon launched Alexa, and the wheels
were set in motion for the current battle for supremacy among the tech
giants’ respective speech recognition platforms.
In
essence, we have spent hundreds of years teaching machines to complete a
journey that takes the average person just a few years. Starting with
the phoneme and building up to individual words, then to phrases and
finally sentences, machines are now able to understand speech with a
close to 100% accuracy rate.
The
techniques used to make these leaps forward have grown in
sophistication, to the extent that they are now loosely based on the
workings of the human brain. Cloud-based computers have entered millions
of homes and can be controlled by voice, even offering conversational
responses to a wide range of queries.
That journey is still incomplete, but we have travelled quite some distance from the room-sized computers of the 1950s.
The Current Speech Recognition Landscape
Smartphones
were originally the sole place of residence for digital assistants like
Siri and Cortana, but the concept has been decentralized over the past
few years.
At
present, the focus is primarily on voice-activated home speakers, but
this is essentially a Trojan horse strategy. By taking pride of place in
a consumer’s home, these speakers are the gateway to the proliferation
of smart devices that can be categorized under the broad ‘Internet of
Things’ umbrella. A Google Home or Amazon Echo can already be used to
control a vast array of Internet-enabled devices, with plenty more due
to join the list by 2020. These will include smart fridges, headphones,
mirrors, and smoke alarms, along with an increased list of third-party
integrations.
Recent Google research
found that over 50% of users keep their voice-activated speaker in
their living room, with a sizeable number also reporting that they have
one in their bedroom or kitchen.
And
this is exactly the point; Google (and its competitors) want us to buy
more than one of these home devices. The more prominent they are, the
more people will continue to use them.
Their
ambition is helped greatly by the fact that the technology is now
genuinely useful in the accomplishment of daily tasks. Ask Alexa, Siri,
Cortana, or Google what the weather will be like tomorrow and it will
provide a handy, spoken summary. It is still imperfect, but speech
recognition has reached an acceptable level of accuracy for most people
now, with all major platforms reporting an error rate of under 5%.
As
a result, these companies are at pains to plant their flag in our homes
as early as possible. Hardware, for example in the shape of a home
speaker system, is not something most of us purchase often. For example,
if a consumer buys a Google Home, it seems probable that they will
complement this with further Google-enabled devices, rather than
purchase from a rival company and create a disjointed digital ecosystem
under their roof. Much easier to seek out devices that will enable
continuity and greater convenience.
For this reason, it makes sense for Amazon to sell the Echo Dot for as low as $29.99. That equates to a short-term financial loss for Amazon on each device sold, but the long-term gains will more than make up for it.
There are estimated to be 33 million smart speakers in circulation already (Voice Labs report, 2017) and both younger and older generations are adopting the technology at a rapid rate.
In
fact, the demographics of an assistant “superuser,” someone who spends
twice the amount of time with personal assistants on a monthly basis
than average — is a 52-year old woman, spending 1.5 hours per month with
assistant apps.
Perhaps
most importantly for the major tech companies, consumers are
increasingly comfortable making purchases through their voice-enabled
devices.
Google
reports that 62% of users plan to make a purchase through their speaker
over the coming month, while 58% use theirs to create a weekly shopping
list:
Short-term
conclusions about the respective business strategies of Amazon and
Google, in particular, are relatively easy to draw. The first-mover
advantage looks set to be marked in this arena, especially as speech
recognition continues to develop into conversational interactions that
lead to purchases.
We have written before
about the two focal points of the voice search strategy for the tech
giants: the technology should be ubiquitous and it must be seamless.
Voice is already a multi-platform ecosystem, but we are some distance
from the ubiquity it seeks.
To
gain insight into the likely outcome of the current competition, it is
worth assessing the strengths and weaknesses of the four key players in
western markets: Amazon, Google, Apple, and Microsoft.
Amazon
First-party Hardware: Echo, Echo Dot, Echo Show, Fire TV Stick, Kindle.
Digital Assistant: Alexa
Usage Statistics:
“Tens of millions of Alexa-enabled devices” sold worldwide over the 2017 holiday season (Amazon)
75% of all smart speakers sold to date are Amazon devices (Tech Republic)
The
Echo Dot was the number one selling device on Amazon over the holidays,
with the Alexa-enabled Fire TV stick in second place. (Amazon)
The
average Alexa user spends 18 minutes a month interacting with the
device, compared to just five minutes for Google Home (Gartner)
There are now over 25,000 skills available for Alexa (Amazon)
Overview:
The
cylindrical Echo device and its younger sibling, the Echo Dot, have
been the runaway hit of the smart speaker boom. By tethering the
speakers to a range of popular third-party services and ‘skills’, Amazon
has succeeded in making the Echo a useful addition to millions of
households.
As Dave Limp, head of Amazon devices, put it recently,
“We think of it as ambient computing, which is computer access that’s less dedicated personally to you but more ubiquitous.”
Ubiquity seems a genuine possibility, based on the sales figures.
After
a holiday season when the Echo Dot became the most popular product on
Amazon worldwide, the Alexa app occupied top position in the App Store,
ahead of Google’s rival product.
Amazon’s
heritage as an online retailer gives it an innate advantage when it
comes to monetizing the technology, too. The Whole Foods acquisition
adds further weight to this, with the potential to integrate the offline
and online worlds in a manner other companies will surely envy.
Moreover,
Amazon has never depended on advertising to keep its stock prices
soaring. Quite the contrary, in fact. As such, there is less short-term
pressure to force this aspect of their smart speakers.
With
advertisers keen to find a genuine online alternative to Google and
Facebook, Amazon is in a great position to capitalize. There is a fine
balancing act to maintain here, nonetheless. Amazon has most to lose, in
terms of consumer trust and reputation, so it will only move into
advertising for Alexa carefully.
The company denies it has plans to do so, but as research company L2 Inc wrote recently,
Amazon
has approached major brands asking if they would be willing to pay for
Amazon’s Choice, a designation given to best-in-class products in a
particular category.
We
should expect to see more attempts from Amazon to provide something
beyond just paid ads on search results. Voice requires new advertising
solutions and Amazon will tread lightly at first to ensure it does not
disrupt the Alexa experience. The recently announced partnership with publishing giant Hearst is a sign of things to come.
The
keys to Alexa’s success will be the integration of Amazon’s own assets,
along with the third-party support that has already led to the creation
of over 25,000 skills. With support announced
for new headphones, watches, fridges, and more, Amazon looks set to
stay at the forefront of voice recognition technology for some time to
come.
Google
First-party Hardware: Google Home, Google Home Mini, Google Home Max, Pixelbook, Pixel smartphones, Pixel Buds, Chromecast, Nest smart home products.
Digital Assistant: Google Assistant
Usage Statistics:
Google Home has a 24% share of the US smart speaker market (eMarketer)
There are now over 1,000 Actions for Google Home (Google)
Google Assistant is available on over 225 home control brands and more than 1,500 devices (Google)
The most popular Google Assistant apps are games, followed closely by home control applications (Voicebot.ai)
Overview:
Google
Assistant is directly tied to the world’s biggest search engine,
providing users with direct access to the largest database of
information ever known to mankind. That’s not a bad repository for a
digital assistant to work with, especially as Google continues to make
incremental improvements to its speech recognition software.
Recent
research from Stone Temple Consulting across 5,000 sample queries found
Google to be the most accurate solution, by quite some distance:
Combined
with Google Photos, Google Maps, YouTube, and a range of other
effective services, Google Assistant has no shortage of integration
possibilities.
Google
may not have planned to enter the hardware market again after the
lukewarm reception for its products in the past. However, this new
landscape has urged the search giant into action in a very serious way.
There is no room for error at the moment, so Google has taken matters
into its own hands with the Pixel smartphones, the Chromecast, and of
course the Home devices.
The
Home Mini has been very popular, and Google has added the Home Max to
the collection, which comes in at a higher price than even the Apple
HomePod. All bases are very much covered.
Google
knows that the hardware play is not a long-term solution. It is a
necessary strategy for the here and now, but Google will want to
convince other hardware producers to integrate the Assistant, much in
the same way it did with Android smartphone software. That removes the
expensive production costs but keeps the vital currency of consumer
attention spans.
This plan is already in action, with support just announced for a range of smart displays:
This
adds a new, visual element to consumer interactions with smart speakers
and, vitally, brings the potential to use Google Photos, Hangouts, and
YouTube.
Google
also wants to add a “more human touch” to its AI assistant and has
hired a team of comedians, video game designers, and empathy experts to
inject some personality.
Google
is, after all, an advertising company, so the next project will be to
monetize this technology. For now, the core aim is to provide a better,
more human experience than the competition and gain essential territory
in more households. The search giant will undoubtedly find novel ways to
make money from that situation.
Although
it was slower off the mark than Amazon, Google’s advertising nous and
growing range of products mean it is still a serious contender in both
the short- and long-term.
Apple
Hardware: Apple HomePod (Due to launch in 2018 at $349), iPhone, MacBooks, AirPods
Digital Assistant: Siri
Usage Statistics:
42.5% of smartphones have Apple’s Siri digital assistant installed (Highervisibility)
41.4 million monthly active users in the U.S. as of July 2017, down 15% on the previous year (Verto Analytics)
19% of iPhone users engage with Siri at least daily (HubSpot)
Overview:
Apple
retains an enviable position in the smartphone and laptop markets,
which has allowed it to integrate Siri with its OS in a manner that
other companies simply cannot replicate. Even Samsung, with its Bixby
assistant, cannot boast this level of synergy, as its smartphones
operate on Android and, as a result, have to compete with Google
Assistant for attention.
Nonetheless,
it is a little behind the curve when it comes to getting its hardware
into consumers’ home lives. The HomePod will, almost certainly, deliver a
much better audio experience than the Echo Dot or Google Home Mini,
with a $350 price tag to match. It will contain a host of impressive
features, including the ability to judge the surrounding space and
adjust the sound quality accordingly.
The
HomePod launch has been delayed, with industry insiders suggesting that
Siri is the cause. Apple’s walled garden approach to data has its
benefits for consumers, but it has its drawbacks when it comes to
technologies like voice recognition. Google has access to vast
quantities of information, which it processes in the cloud and uses to
improve the Assistant experience for all users. Apple does not possess
this valuable resource in anything like the same quantity, which has
slowed the development of Siri since its rise to fame.
That said, these seem likely to be short-term concerns.
Apple
will stay true to its core business strategy and it is one that is
served it rather well so far. The HomePod will sit at the premium end of
the market and will lean on Apple’s design heritage, with a focus on
providing a superior audio experience. It will launch with support for
Apple Music alone, so unless Apple opens up its approach to third
parties, it could be one for Apple fans only. Fortunately for Apple,
there are enough of those to ensure the product gains a foothold.
Whether its
Microsoft
Hardware: Harman/kardon Invoke speaker, Windows smartphones, Microsoft laptops
Digital Assistant: Cortana
Usage Statistics:
5.1% of smartphones have the Cortana assistant installed
Cortana now has 133 million monthly users (Tech Radar)
25% of Bing searches are by voice (Microsoft)
Overview:
Microsoft
has been comparatively quiet on the speech recognition front, but it
possesses many of the component parts required for a successful speech
recognition product.
With
a very significant share of the business market, the Office suite of
services, and popular products like Skype and LinkedIn, Microsoft
shouldn’t be written off.
Apple’s
decision to default to Google results over Bing on its Siri assistant
was a blow to Microsoft’s ambitions, but Bing can still be a competitive
advantage for Microsoft in this arena. Bing is a source of invaluable
data and has helped develop Cortana into a much more effective speech
recognition tool.
The
Invoke speaker, developed by Harman/kardon with Cortana integrated into
the product, has also been reduced to a more approachable $99.95.
There
are new Cortana-enabled speakers on the way, along with smart home
products like thermostats. This should see its levels of uptake
increase, but the persistent feeling is that Microsoft may be a little
late to this party already.
Where
Microsoft can compete very credibly is in the office environment, which
has also become a central consideration for Amazon. Microsoft is
prepared to take a different route to gain a foothold in this market,
but it could still be a very profitable one.
The Future of Speech Recognition Technology
We
are still some distance from realizing the true potential of speech
recognition technology. This applies both to the sophistication of the
technology itself and to its integration into our lives. The current
digital assistants can interpret speech very well, but they are not the
conversational interfaces that the technology providers want them to be.
Moreover, speech recognition remains limited to a small number of
products.
The rate of progress, compared to the earliest forays into speech recognition, is really quite phenomenal nonetheless.
As
such, we can look into the near future and envisage a vastly changed
way of interacting with the world around us. Amazon’s concept of
“ambient computing” seems quite fitting.
The
smart speaker market has significant room left to grow, with 75% of US
homes projected to have at least one by the end of 2020.
Now
that users are getting over the initial awkwardness of speaking to
their devices, the idea of telling Alexa to boil the kettle or make an
espresso does not seem so alien.
Voice is becoming an interface of its own, moving beyond the smartphone to the home and soon, to many other quotidian contexts.
We
should expect to see more complex input-output relationships as the
technology advances, too. Voice-voice relationships restrict the
potential of the response, but innovations like the Amazon Echo Show and
Google’s support for smart displays will open up a host of new
opportunities for engagement. Apple and Google will also incorporate
their AR and VR applications when the consumer appetite reaches the
required level.
Challenges
remain, however. First of all, voice search providers need to figure
out a way to provide choice through a medium that lends itself best to
short responses. Otherwise, how would it be possible to ensure that a
user is getting the best response to their query, rather than the
response with the highest ad budget behind it?
Modern
consumers are savvy and have access to almost endless information, so
any misjudgements from brands will be documented and shared with the
user’s network.
A
new study from Google has shown that there is an increasing acceptance
among consumers that brands will use smart speakers to communicate with
them. A sizeable number revealed a willingness to receive information
about deals and sales, with almost half wanting to receive personalized
tips:
Speech
recognition technology provides the platform for us to communicate
credibly, but it is up to marketers to make the relationship with their
audience mutually beneficial.
Key Takeaways
Brands
need to consider how they can make an interaction more valuable for a
consumer. The innate value proposition of voice search is that it is
quick, convenient, and helpful. It is only by assimilating with — and
adding to -this relationship between technology and consumer that they
will cut through. The Beauty and the Beast example provides an early,
cautionary tale for all of us.
Amazon
is in prime position to monetize its speech recognition technology, but
still faces obstacles. Sponsorship of Amazon’s Choice has been explored
as a route to gain revenue without losing customers.
Google
has made speech recognition a central focus for the growth of their
business. With a vast quantity of data at its disposal and increasing
third-party support, Google Assistant will provide a serious threat to
Amazon’s Alexa this year.
Marketers
should take advantage of technical best practices for voice search to
increase visibility today. While this technology is still developing, we
need to give it a helping hand as it completes its mammoth tasks.
The
best way to understand how people use speech recognition technology is
to engage with it frequently. Marketers serious about pinpointing areas
of opportunity should be conducting their own research at home, at work,
and on the go.
Hardik Gandhi is Master of Computer science,blogger,developer,SEO provider,Motivator and writes a Gujarati and Programming books and Advicer of career and all type of guidance.