By: Cayden Pierce
“Now I see what you’re saying!”
You’re fresh out of school and you just started your first job. You walk into a meeting and sit down with your computer. The CEO launches into his presentation. You’re sitting there, laptop out, ready to go, but the first sentence in he says “We need to double MRR by end of Q2, here’s how.” Oh, no. You know you’ve heard “MRR” before, but you can’t remember what it stands for. Also, you know these “Q’s” seperate out the year, but when exactly is the “end of Q2”? You reach for your laptop, open your browser, and are about to type in “what is MRR” when you realize that your manager, and his manager, are sitting at your sides – you’re not sure you want to advertise your ignorance so soon. So you sneak out your phone and enter into a search engine “what is MRR”. A card pops up “MRR stands for ‘monthly recurring revenue’.” Oh, of course! You knew that, you just couldn’t remember it. Then you do a search for “q2 months of year”, switch over to images, and see a chart that shows Q2 ending in June – oh yeah, of course, they’re quarters of the year. You glance up and realize the CEO is on the third slide, and you’ve missed the last 40 second explanation of why this MRR deadline is so important. Now you’re behind.
How could this have gone better? You already knew what MRR is from reading about it, you just weren’t able to recall it when your CEO spoke it aloud. You also have an intuitive understand of “the end of June”, but you’re not used to thinking in quarters. Your search engine queries met your needs, but the process of pulling out your device, opening the app, and forming a query took you out of the context of the meeting and into the context of your phone, making you miss critical information that came next. What if that information had been available to you immediatlely, without requiring you to switch context to your phone? What if, as soon as you heard a word or concept that you didn’t remember or already know, you had immediate access to the information that you needed?
Imagine you have a personal team of assistants that are remotely hearing what you hear and seeing what you see. The team is comprised of experts in many domains – technology, business, neuroscience, history, etc. This team has been with you for several years, and has a detailed representation of everything that you’ve learned, your experiences, and even the ways that you think. In your meeting, when the CEO says “MRR”, your “business-expert agent” suspects you don’t remember this term, so he presses a “talk button” and whispers in your ear “monthly recurring revenue”. As soon as you hear that, you understand, and it happened so quickly that you didn’t lose track of what the CEO is saying. When “Q2” is said, your “data-expert agent” sees on his computer that you have never used quarters to refer to a time of year, so he quickly whispers in your ear “late June”. Again, this simple information takes only a brief moment to attend to and integrate, but it drastically increases your understanding of the conversation.
This type of system is a fundamental way to improve our ability to execute and increase our knowledge, intelligence, and rate of learning. Unfortunately, it’s not reasonable for 99.9% of people to have a full time personal assistant team. However, with the modern ecosystem of AI, wearable devices, and powerful edge compute + storage, it’s just now becoming possible to fully automate this process and upgrade human cognition in the process. The automation of this process results in a system I refer to as a “Contextual Search Engine”.
The Cognitive Workflow – Search Engines of Today (2023)
Search engines are the prime example of a technology that improves our cognitive capabilities. Today, they form a fundamental step in our cognitive workflow, so much so that absolutely everyone – despite their age, occupation, or interests – uses a search engine everday, multiple times a day. This article is about the next generation of search engine, so let’s start by looking at what a search engine actually is, and why they’re fundamentally important to humanity.
When thinking, solving a problem, brainstorming ideas, or performing various cognitive tasks, one often comes across a knowledge gap. A knowledge gap is missing knowledge, or some knowledge that you may hold, but are not able to recall in the current moment and current context. One identifies a knowledge gap as information that, if accessable, would help one better perform the current cognitive task. The next step is to formulate a query whose answer would fill in the knowledge gap one has come across. One then switches context to their device, opens a search engine, types or speaks the query, and receives an answer. With the new information in hand, one is then freed to continue their cognitive task, empowered by their new knowledge.
1,000 years ago, things weren’t so easy. When one came across a knowledge gap, they had to live with it, or search for a learned person who might hold the knowledge they seek, or individually spend countless hours trying to figure it out for themselves. Needless to say, the difficult of bridging knowledge gaps 1,000 years ago would stop most cognitive processes in their tracks. The problem of going from query to answer was very, very hard.
50 years ago, libraries had already become the chief method of solving the problem of going from a query to an answer. When one came across a knowlege gap, they could go to the library and search for books on the topic. The semantic ordering of the Dewey Decimal system and references in books allowed one to find related information. What used to take years could be done in a few days at the library.
Today, search engines have brought that process down to seconds by automating the most time-consuming step – going from a query to an answer. Instead of manually identifying useful sources and manually sifting through the information, search engines have indexed everything, allowing us to skip over resource identification, searching, reference hopping, etc. and go straight to the answer. This automation is not a trivial thing – it has completely revolutionized the way that humans think. The average person performs half a dozen to dozens of queries per day to find the answers or resources that they need. In conversations, while working on problems, or while learning new information, we’re constantly met with things that we don’t know. Search engines are the first place we turn to fill in these knowledge gaps.
Problems of Modern Search Engines
However, today’s search engines have weaknesses that drastically limit their power.
- They’re too slow to use. When you’re in the middle of a conversation and you don’t understand something that was said, you don’t have time to figure out the right question to ask, pull out your phone, and search for that thing. It’s very common during conversation that extra information would be useful, but the average time of 20 seconds that it takes to pull out one’s phone, open a browser, search, and find an answer is too long. This action is also too mentally resource intensive – it takes our attention away from our conversation as we switch context out of our conversation and into our phone. For this reason, search engines don’t live up to their potential in conversations.
- They help us when we know what we don’t know – they don’t help us when we don’t know what we don’t know. Our usage of search engines today is usually explicit and directed – we realize there’s some knowledge we lack, and we use a search engine to find it. The opportunity to discover new things things – knowledge that we don’t know that we don’t know, is untapped by today’s search engines.
- They only act on public knowledge, not private knowledge. So often, the most valuable information in a given context comes from a previous conversation, an email, a book we read, etc., the presentation of which would trigger our existing memory. But today’s search engines consider none of these data sources.
A contextual search engine is an upgrade to the modern search engine that solves these problems. It solves them by automating the manual steps of today’s search engines – knowledge gap discovery, query formation, and search. A contextual search engine listens to your conversation, identifies a knowledge gap, and immediately provides an answer in a modality that doesn’t require you to switch context. The contextual search engine doesn’t just tell you what you know you don’t know – it continually searches for relevant information to the current conversation and presents that as a prompt for further thought, discussion, and exploration. The data that a contextual search engine searches includes not only public knowledge, but also all of the information that you have experienced in the past – your private knowledge base.
Why Upgrade Conversations?
I care a lot about conversations. Human technology has been hyper focused on continuously improving our remote interactions, but we’ve done almost nothing to improve face-to-face interactions. A conversation with your friends in the living room today unfolds in a manner that is nearly identical to what we would have done 50 or 1,000 years ago. We augment ourselves in every way, yet the most important bit of our existence – our relationships and connection to others – remains largely unenhanced.
In fact, we are regressing in this regard. Whereas the early internet was about about connection and communication – chat and conversations – promising a future where we might enhance and extend the possible feelings of felt-presence that we share with each other – our modern internet has devolved to a circus of feeds. One-to-many distribution can be extremely useful and valuable, but this format does not replace social interaction and synchronous communication. Feeds are an asynchronous, depersonalizing channel that has largely been overtaken by attention hacking information. “Social media” isn’t social – most of the content one views on social media is created by people that users don’t know and never will, and the modality of engagement is not a social one of presence and experience.
Conversations are a fundamentally important aspect to our existence as intelligent agents. They’re the foundation of our relationships with other intelligent beings. They’re where we learn, grow, explore, laugh, and cry. Conversations are the reason we achieve higher intelligence, as language encodes our knowledge and models of the world, and conversations are how we pass it around. Convo is king, yet it’s stuck in the stone age – it’s time to upgrade our conversations to allow us to understand each other deeper, learn more, and go further. It’s time to add a new layer to the synchronous channel of communication from person to person. It’s time to upgrade conversations.
Another aspect of “why conversations” is practical – conversations are the only time that live human thought is encoded in a way that we can digitally represent. Conversations cause us to make our stream of thoughts into speech, which we can capture with computers, and then do all sorts of wonderful things with.
Finally, conversations are form of thinking. Speaking thoughts aloud, working ideas out with friends, and presenting information in a way that can be understood by others is an effective tool for critical thinking.
In summary, conversations are fundamentally important for two reasons:
- Communication, felt presence, and relationships are the most important thing to humans.
- They are the only time we think in a way such that we can capture the stream of thought digitally.
- Conversations are thinking – and thus improving them is improving thinking.
Examples User Stories of a Conversational Contextual Search Engine
The discussion so far has been somewhat abstract – knowledge gaps, queries, context, etc. What are some actual use cases of a contextual search engine, and how is it better than the search engines of today? Let’s look at a few concrete example that just scrape the surface of what a contextual search engine could do:
- You’re pitching your startup to a venture capitalist. They interject and ask how your solution is different from “X Inc.”‘s solution? You’ve heard of X before, you have a big spreadsheet on your computer of all competitors, but you can’t remember exactly which one they are right now. Before you even realize your ignorance, the contextual search engine has searched the web for “X” and presented their logo and quick summary of their business, jogging your memory. Milliseconds later, the contextual search engine pulls up the row in your spreadsheet where you describe your competitive advantage. You’re able to answer the venture capitalist with confidence.
- You’re talking to your coworker and they mention their kids aren’t practicing piano, and they’re thinking about paying them to practice. You start explaining there’s some related research on extrinsic vs. intrinsic reward that you read about years ago. Your contextual search engine hears this, finds your notes on the paper you’re discussing, summarizes them, and presents a couple of bullet points that trigger your memory of the paper immediately. You’re now able to remember and explain the relationship of the research to your coworker’s decision. A quick gesture allows you to send the paper reference to your coworkers, backing up your claims.
- You’re taking a 3 week vacation to travel the world. You want to see places and cultures unlike you’ve seen before, so you decide to go to Southeast Asia. Once there, you meet people from all over the world. It used to always be a little awkward and slow when people mention they’re from a place you’ve never heard of and know nothing about. Now though, whenever someone mentions a city or country, you instantly see a world map in the corner of your vision with a pin on the place mentioned. The map zooms in to a satellite view and reveals a closer look at the location, with information about the local language and population.
These are just a few of many, many examples in conversation when extra information, whether personal or public, could aid in improving user’s capabilities.
Platform – Human I/O
A contextual search engine requires a hardware platform that can:
- Sense and capture the user’s environment to understand their current context and conversation.
- Provide information to the user.
- Work in a way that is ignoreable and doesn’t require a context switch to use.
There are a few options for how this type of system could be achieved:
- Smart-phones and/or desktop computers. One can display contextual search engine output on your screen. This is low hanging fruit for video calls and a good environment for rapid prototyping. While a useful and valid contextual search engine could be built on just a phone or desktop computer interface during video calls, these platforms don’t work well in the real world – switching attention away from your conversation and into your phone is a bad user experience, taking you out of your conversation.
- In ear microphone and speaker and/or audio smart glasses. Earbuds/airpods and audio smart glasses (smart glasses without displays) can sense the environment with omnidirectional, far-field microphones and use be paired with text-to-speech (TTS) to read out information to the user. This is a promising platform as it solves the context switching problem, but suffers from the fact that speech is temporal and linear – you have to listen to it when it’s being said, or you miss it. You don’t have as much attentional control. Speech is ignoreable, but if ignored, it can’t be absorbed afterwards.
- Smart glasses with display. Smart glasses have microphones to capture context and speakers to play speech. They solve the problems of context capture just like a wearable audio solution – but they also solve the problem of ignoreability and context switching. The do so by presenting information visually to the user, such that they can ignore that information completely, or delay paying attention to it until they have a chance.
- Neural Interfaces. A semantic encoding and decoding interface that could read your thoughts and inject new thoughts, paired with all of the visual and audio wearable sensors on smart glasses.
Why Smart Glasses?
This article will explore the use of smart glasses as the platform for widespread consumer adoption of contextual search engines. This is because they meet the fundamental needs of a contextual search engine while remaining possible today (wearable, real-world-ready semantic neural interfaces are still some years away, as of this writing).
- Always with us – no matter where we are or what we’re doing.
- No context switch to receive information – always available visual and audio human input.
- Environment/contextual sensing – sensors hear what we hear and see what we see.
Challenges of Contextual Search Engines
It’s Q2 2023 right now, and we still don’t have smart glasses with displays that can be worn all day, everyday. I personally own an array of the world’s most advanced smart glasses, some of which are not even on the market yet, and none of these have hit the physical comfort, social comfort, and usability bar to be worn all day. However, as I discuss elsewhere, the optics technology is coming of age, and the hardware OEMs are finally realizing the need to cut back on features and focus on comfort, such that I expect to see all-day everyday smart glasses (with displays) hitting the consumer market in ~ 2024. That is to say, this problem won’t be around for long, and so now is a good time to start building.
The smart glasses hardware industry has had a problem for quite some time. Hardware makers define an overly ambitious feature requirements for their hardware – strereoscopic HD displays with cameras, WiFi, multiple microphones, speakers that replace your headphones, a full applications processor, etc. The reality of the optics, wireless communications, processor, and battery technology today is that these requirements need to be massively rolled back to create a pair of smart glasses that are light and small enough for consumer adoption. Monochrome, monocular, camera-less, BLE/UWB/HBC communications, microcontroller as processor, ultra-low power designs are what is required to hit the form factor requirements – something that only a few companies actually take to heart, with most spending millions to build a 150 gram brick that you can’t wear for more than 30 minutes (if the battery isn’t dead by then).
Fortunately, most of the immediately valuable and sought after use cases that have real ROI in users lives don’t need immersive mixed reality (MR) glasses – they can run on light, slim, feature-light smart glasses. Contextual search engines, intelligent assistants, live translation, live captions, shopping assistant, notifications, etc. are all possible with this type of glasses. All they need is a display and microphone to enable those things to happen.
The realization that the first wave of smart glasses will all use microcontrollers means that apps will have to run on a connected smart phone. This realization is the reason we’ve built the Smart Glasses Manager – a way to run apps on your phone that you see and interact with on your glasses, allowing developers to write 1 simple app that runs on any pair of smart glasses: https://github.com/TeamOpenSmartGlasses/SmartGlassesManager
Context Capture: Signals and Sensors
The signal to noise ratio of wearable environmental sensors needs to be high in order to understand the context and use it to provide helpful information.
Today, we have very powerful ASR systems when they’re running on high quality speech audio. However, the real world contains a lot of noise, and necessarily includes a distance between the worn sensor and the people the user is trying to transcribe. This noise and seperation leads to poor signals. We will need our wearables to employ audio sensing technology that is enhanced for user speech recognition and also environmental speech recognition. Systems like omnidirectional, far field microphones are a start, but we’ll likely need microphone arrays in our wearables to sense high quality audio to pump through our ASR models.
Some of the challenges in sensing have led me to develop a wearable microphone array (https://github.com/CaydenPierce/MSA) that eventually was integrated into a pair of smart glasses (https://github.com/TeamOpenSmartGlasses/OpenSourceSmartGlasses). However, these are prototypes, and we’ll need consumer-ready products that are physically and socially comfortable, run all day, and achieve high SNR to achieve powerful contextual search engines.
A contextual search engine is an always-available system that answers your questions before you ask, provides you with useful information exactly when you need it, enhances your memory, and deepens conversations by understanding your context and knowledge. Smart glasses are the ideal form factor to achieve a powerful, valuable contextual search engine in the near future.
This is a shorted version of a longer article by Cayden Pierce. If you’re interested in the full version that goes more in depth, you can find that here: https://medium.com/@caydenpierce4/the-future-of-search-engines-next-gen-conversations-through-contextual-search-2335d65019f5
1. Nils Pihl on synchronous conversation – https://youtu.be/4xu4_NThoZ4?t=665
2. The Rememberance Agent – https://link.springer.com/article/10.1007/BF01682024
3. Emex Labs – Building Contextual Search Engines – https://emexwearables.com/
4. The AR Show – Cayden Pierce Interview about Contextual Search Engines – https://www.thearshow.com/podcast/148-cayden-pierce