← ~/logs LOG-002 2026-03-22 00:00 UTC

>Receiving Voice in Discord After DAVE: A Field Report

Discord enforced end-to-end encryption for voice in March 2026. Here is what broke, what was undocumented, and how I patched my way through it.

On March 2, 2026, Discord flipped the switch on DAVE — Discord Audio/Video Encryption — making end-to-end encryption mandatory for all voice channels. My bot, which had been happily receiving voice audio, transcribing it with Whisper, and responding via TTS, went completely silent overnight.

This is the story of getting it working again. It took five commits, one monkey-patch, and a heuristic I’m not proud of.

The setup

I’m building Citadel, a multi-tenant AI agent platform for Discord. Agents can join voice channels, listen to users via local STT (faster-whisper), run the input through an LLM, and speak back via TTS (Kokoro). The voice pipeline looks like:

User speaks → discord.py receives packets → PCM audio
  → faster-whisper (STT) → text
  → LLM agent → response text
  → Kokoro (TTS) → WAV file
  → discord.py plays audio → User hears response

Before DAVE, this worked. The discord-ext-voice-recv library handled packet receiving. Audio came in as PCM, I buffered it per-user, detected silence, and fired off transcription. Simple.

Then DAVE happened.

Problem 1: The library stack wasn’t ready

DAVE means every voice packet is now end-to-end encrypted. The receiving side needs to:

Complete a DAVE handshake with Discord
Decrypt each audio packet using the sender’s key
Then do the normal opus decode

The stable release of discord-ext-voice-recv didn’t know DAVE existed. Neither did the version of discord.py I was on.

The fix: Upgrade to discord.py 2.7.1 (which added DAVE support), pin discord-ext-voice-recv to an unmerged pull request (#54 — the only branch with DAVE-compatible receiving), and add the davey library for the E2EE crypto:

# pyproject.toml
"discord.py[voice]>=2.7.1",
"discord-ext-voice-recv @ git+https://github.com/imayhaveborkedit/discord-ext-voice-recv.git@refs/pull/54/head",
"davey>=0.1.0",

Yes, I’m pinning a dependency to an unmerged PR via git ref. Not ideal, but it was the only path forward.

Problem 2: The bot was joining deaf

After the upgrade, the bot connected to voice channels successfully. Logs showed the DAVE handshake completing. But no audio packets arrived. Zero. Nothing in the callback.

This one took an embarrassingly long time to find. The connect() call defaults to self_deaf=True. A deafened bot doesn’t receive audio packets from Discord — the server simply doesn’t send them.

# Before (broken):
vc = await voice_channel.connect(cls=voice_recv.VoiceRecvClient)

# After (working):
vc = await voice_channel.connect(cls=voice_recv.VoiceRecvClient, self_deaf=False)

One keyword argument. Hours of debugging.

But there was a second issue hiding here: even with self_deaf=False, the first few packets were garbage. The bot was starting to listen before the DAVE handshake finished, so it was trying to process encrypted packets without the decryption keys.

# Wait for DAVE handshake before listening
if hasattr(vc, '_connection') and hasattr(vc._connection, 'dave_session'):
    for _ in range(50):  # wait up to 5 seconds
        ds = vc._connection.dave_session
        if ds and ds.ready:
            break
        await asyncio.sleep(0.1)

There’s no event or callback for “DAVE is ready.” You poll internal state.

Problem 3: Crashes inside the packet decoder

Packets were now arriving and being decrypted. But the bot would receive audio for a few seconds, then stop completely. No error in my code. The voice-recv library’s PacketRouter loop was dying silently.

I needed visibility into what was happening inside the library. I added a monkey-patch on PacketDecoder._process_packet to log everything:

def _debug_process_packet(self, packet):
    member = self._get_cached_member()
    ds = self.vc._connection.dave_session
    logger.warning("voice_dave_debug",
        member=str(member),
        has_decrypted_data=packet.decrypted_data is not None,
        dave_session_ready=ds.ready if ds else None,
    )
    return _original_process_packet(self, packet)

The logs revealed two bugs in voice-recv’s PR #54:

Bug A: The first packet for any user had member=None. The library tries to map the packet’s SSRC (a numeric stream ID) to a Discord user via the SPEAKING event. But the first audio packet often arrives before the SPEAKING event. PR #54’s code then calls dave_session.decrypt(member.id, ...) — which throws AttributeError because member is None.

Bug B: When an opus decode failed (corrupted or partially-decrypted packet), the OpusError exception propagated up and killed the PacketRouter’s entire receive loop. One bad packet = the bot stops hearing everything.

The fix: I replaced the debug monkey-patch with a full reimplementation of _process_packet. The key principles: never crash, skip bad packets gracefully.

def _patched_process_packet(self, packet):
    pcm = None
    member = self._get_cached_member()

    if member is None:
        self._cached_id = self.sink.voice_client._get_id_from_ssrc(self.ssrc)
        member = self._get_cached_member()

    # DAVE decrypt — skip if member is still unknown
    if (
        _has_dave
        and member is not None
        and not packet.is_silence()
        and packet.decrypted_data is not None
        and self.vc._connection.dave_session is not None
        and self.vc._connection.dave_session.ready
    ):
        try:
            packet.decrypted_data = self.vc._connection.dave_session.decrypt(
                member.id, _MediaType.audio, bytes(packet.decrypted_data)
            )
        except Exception:
            return  # skip this packet

    if not self.sink.wants_opus():
        try:
            packet, pcm = self._decode_packet(packet)
        except Exception:
            return  # skip corrupted packet

    data = voice_recv.opus.VoiceData(packet, member, pcm=pcm)
    self._last_seq = packet.sequence
    self._last_ts = packet.timestamp
    return data

Every failure point returns None instead of raising. Losing a packet is imperceptible in a voice stream. Losing the entire receive loop is fatal.

Problem 4: The SPEAKING event never fires

With the crash fixes in place, audio was flowing. But member was still None on every single packet. The SSRC-to-user mapping never populated because Discord wasn’t sending the SPEAKING event at all.

This is the one that made me question everything. The SPEAKING event is how Discord tells you “SSRC 12345 belongs to user 67890.” Without it, you have audio packets from an anonymous source. You can decode them, but you don’t know who’s talking.

I couldn’t fix Discord. I couldn’t fix the library. So I wrote a heuristic:

if member is None and hasattr(self.vc, 'channel') and self.vc.channel:
    non_bot_members = [m for m in self.vc.channel.members if not m.bot]
    if len(non_bot_members) == 1:
        member = non_bot_members[0]
        self.vc._add_ssrc(member.id, self.ssrc)
        self._cached_id = member.id

If we can’t identify the speaker from protocol events, look at who’s actually in the voice channel. If there’s exactly one human, it must be them. Cache the mapping so every subsequent packet resolves instantly.

Is this hacky? Yes. Does it break with multiple users? Yes. Does it work for the 1-on-1 voice agent use case? Perfectly.

What I learned

Pin to reality, not releases. When the ecosystem hasn’t caught up to a protocol change, you use what exists — even if it’s an unmerged PR. You can upgrade to a proper release later. Shipping matters more than dependency hygiene.

Defaults are the first place to look. self_deaf=True is a reasonable default for bots that play music. It’s a silent killer for bots that need to listen. When something doesn’t work at all, check the defaults before diving into protocol traces.

Monkey-patching is a valid debugging tool. I wouldn’t ship a debug monkey-patch, but I did ship a functional one. When a library’s internals are broken and you can’t wait for a fix, patching at the right layer is better than forking.

Skip, don’t crash. Voice is lossy by nature. Humans don’t notice a few dropped packets. They definitely notice when the bot goes deaf. Every error handler in the packet pipeline should swallow and continue, not propagate and kill.

Heuristics have their place. The SSRC fallback is objectively fragile. It’s also the only thing that works right now, and it covers the primary use case. I’ll replace it when the library fixes SPEAKING event handling. Until then, it ships.

The full monkey-patch lives in app/integrations/discord/voice.py. If you’re building a Discord voice bot in 2026 and hitting DAVE issues, start there.