TechPulse Lab

Posted on Apr 6 • Originally published at smarthomemade.com

Voice Assistant Privacy: What Alexa and Google Actually Record

#privacy #smarthome #iot #security

"Alexa, is my husband cheating on me?" "Hey Google, how do I hide money from the IRS?" "Siri, I'm so lonely."

People say things to voice assistants they wouldn't say to another person. And somewhere, on a server in Virginia or Oregon, those recordings exist. Maybe they've been listened to by a human reviewer. Maybe they've been used to train an AI model. Maybe they'll be subpoenaed in a divorce proceeding.

This isn't fearmongering. It's documented reality. Here's exactly what Amazon, Google, and Apple record, who listens, and what you can do about it.

What Gets Recorded

Amazon Alexa

Every time Alexa detects its wake word ("Alexa," "Echo," "Computer," or "Amazon"), it starts streaming audio to Amazon's cloud servers. The recording includes:

Your voice command — Everything from the wake word to the end of the detected query
A few seconds before the wake word — The "pre-roll" buffer that helps Alexa capture the full wake word
Potentially more — If Alexa mishears background noise as a wake word (which happens more than you'd think), it records whatever follows

These recordings are stored in your Amazon account indefinitely by default. You can view and listen to every single recording in the Alexa app under Settings → Alexa Privacy → Review Voice History. Go look right now. Most people are surprised by how many recordings exist and how many are accidental triggers.

Amazon confirmed in 2019 that human reviewers listen to a sample of Alexa recordings to improve speech recognition. These reviewers see the recording along with the account's first name and device serial number.

Google Home / Google Assistant

Google's approach is similar. When the Google Home device detects "Hey Google" or "OK Google," it records and transmits:

Your voice command — Wake word through the end of the detected query
Audio context — A brief buffer of audio around the trigger
Associated data — Your Google account, device location, time, and what action was taken

Google also confirmed human review of recordings in 2019. Reviewers transcribe audio snippets to improve speech recognition. Google paused the practice temporarily after a Belgian contractor leaked recordings to a news outlet, but has since resumed with modified protocols.

Google stores recordings in your account under myactivity.google.com. Like Amazon, the default is indefinite storage. You can change this to 3-month or 18-month auto-delete.

Apple Siri

Apple has historically been more privacy-focused:

Recordings are anonymized — Associated with a random identifier, not your Apple ID
Recordings are stored for up to 6 months for quality improvement, then stripped of the random ID
No persistent recording history you can review (unlike Alexa and Google)
Human review requires opt-in — After the 2019 controversy, Apple made human review of Siri audio opt-in only

Apple's approach isn't perfect — they still collect audio data — but the anonymization and opt-in review are meaningful privacy improvements.

False Wake Words: The Real Problem

The most concerning aspect isn't the intentional commands — it's the accidental triggers. Voice assistants misidentify wake words constantly.

Research from Northeastern University found that smart speakers activate without a wake word between 1.5 and 19 times per day depending on the device, ambient noise, and what's playing on TV or radio.

Common false triggers for Alexa include words like "election," "a letter," "alexis," "I like to," and dozens of others that phonetically resemble "Alexa." For Google, phrases like "OK cool," "Hey boo," and "OK dude" can trigger recording.

Each false trigger records 5-15 seconds of ambient conversation. Over a year, that's thousands of unintentional recordings of private conversations, arguments, intimate moments, and things you'd never willingly share with a tech company.

What Companies Do With Your Data

Speech Recognition Training

All three companies use voice recordings to train their speech recognition models. Amazon and Google use real human reviewers (contractors, typically) to transcribe a subset of recordings. These reviewers hear the actual audio and type out what was said. Multiple leaks have shown that contractors sometimes share interesting or disturbing recordings.

Targeted Advertising (Amazon and Google)

Amazon doesn't directly say "we use Alexa recordings for ad targeting." What they say is "we use information from your interactions with Alexa to show you relevant advertising." The distinction is subtle and arguably meaningless.

If you ask Alexa about baby products, don't be surprised to see baby product ads on Amazon. Apple does not use Siri data for advertising — one of the clearest differentiators in Apple's privacy approach.

Law Enforcement

Amazon, Google, and Apple have all received law enforcement requests for voice assistant data. Amazon has complied with subpoenas for Alexa recordings in murder cases. The companies require a valid legal process (warrant, subpoena, or court order) before disclosing data. But the key point: the data exists to be disclosed.

How to Protect Your Privacy

Level 1: Basic Settings (Everyone Should Do This)

For Alexa:

Alexa app → Settings → Alexa Privacy
Set "Choose how long to save recordings" to Don't save recordings or 3 months
Review Voice History → Delete all existing recordings
Turn OFF "Help improve Amazon services" (disables human review)

For Google:

Go to myactivity.google.com
Web & App Activity → Turn OFF "Include audio recordings"
Set auto-delete to 3 months
Delete existing voice recordings

For Apple:

Settings → Privacy & Security → Analytics & Improvements
Turn OFF "Improve Siri & Dictation"

Level 2: Hardware Controls

Mute the microphone when not in use. All Echo devices have a physical mute button (red ring = muted). Google Home has a mute switch on the back. When muted, the device physically disconnects the microphone — it's a hardware kill, not a software toggle.

Move smart speakers out of bedrooms and private spaces. Kitchen is fine. Nightstand is not.

Level 3: Local Voice Processing

The holy grail is local processing — where voice recognition happens on your device without sending audio to the cloud.

Home Assistant Voice (Wyoming + Whisper + Piper): Home Assistant's voice initiative uses open-source speech-to-text (Whisper) and text-to-speech (Piper) running entirely on local hardware. A Raspberry Pi 5 or mini PC handles voice recognition without any cloud connection.

The tradeoff: local voice assistants are currently slower and less accurate. For basic smart home control (lights, thermostats, locks, media), it's already viable. For complex queries, cloud assistants are still significantly better.

Level 4: No Smart Speakers

The most private option is simply not having always-listening devices in your home. You can still have a fully automated smart home — motion sensors, schedules, and app control don't require voice assistants.

Physical buttons (like IKEA's SOMRIG or Aqara's Mini Switch) provide quick control without microphones.

The Bottom Line

Voice assistants are incredibly useful. They're also corporate microphones in your home that record more than you think, store data longer than you'd expect, and share it with more people than you'd want.

The middle ground: use voice assistants for convenience, apply Level 1 and Level 2 privacy protections, and keep smart speakers out of bedrooms and private spaces. For maximum privacy, explore local voice processing or skip always-listening devices entirely.

Your home should be your private space. Whether you choose convenience, privacy, or a balance of both — make it an informed choice, not an accidental one.

Originally published at SmartHomeMade.

DEV Community