Image Voice Memos Icon

Image Voice Memos

Voice memos and automatic transcription for your photos β€” right on your Mac.

Apple Silicon · macOS 15+ · SwiftUI · On-Device AI
Image Voice Memos β€” App Screenshot

Features

Everything you need to document photos with your voice.

πŸŽ™οΈ

Instant Recording

Audio hardware is pre-warmed β€” recording starts without delay after a 1-second countdown.

πŸ“

Transcription

Automatic speech-to-text using Apple's Speech Framework. Supports German and English.

🌐

Translation

German transcriptions are optionally translated to English β€” completely local on-device.

πŸ–ΌοΈ

RAW Support

Supports NEF, RAF, ORF, DNG and all common image formats. Fast thumbnails via CGImageSource.

πŸ“

Folder-Based

No import needed β€” just pick a folder. Meomos are stored as sidecar files alongside your photos.

πŸ”’

Privacy First

Runs exclusively on Apple Silicon. No cloud, no external APIs. Sandboxed with Hardened Runtime.

Workflow

The complete flow from folder selection to finished transcription.

User Workflow

flowchart TD
    A["πŸ“ Select Folder"] --> B["πŸ–ΌοΈ Load Photo Grid"]
    B --> C["πŸ‘† Select Photo"]
    C --> D{"Voice memo\nexists?"}

    D -- No --> E["πŸŽ™οΈ Start Recording"]
    D -- Yes --> K["▢️ Playback / πŸ—‘οΈ Delete"]

    E --> F["⏱️ 1s Countdown\n+ Hardware Warmup"]
    F --> G["πŸ”΄ Recording\n+ Waveform Display"]
    G --> H["⏹️ Stop"]
    H --> I["πŸ’Ύ PCM β†’ AAC\nConversion"]
    I --> J["πŸ“ Transcription\n(Speech Framework)"]
    J --> L{"Translation\nenabled?"}
    L -- Yes --> M["🌐 DE β†’ EN\nTranslation"]
    L -- No --> N["βœ… Done"]
    M --> N
    K --> C

    style A fill:#1a3a5c,stroke:#5e9eff,color:#e8e8e8
    style G fill:#5c1a1a,stroke:#ff5e5e,color:#e8e8e8
    style N fill:#1a5c2a,stroke:#34d058,color:#e8e8e8
            

State Machine β€” VoiceMemoState

stateDiagram-v2
    [*] --> noNote
    noNote --> countingDown : Record
    countingDown --> recording : Countdown = 0
    recording --> converting : Stop
    recording --> noNote : Cancel
    converting --> noteExists : AAC saved
    noteExists --> playing : Play
    noteExists --> countingDown : Re-Record
    noteExists --> noNote : Delete
    playing --> paused : Pause
    playing --> noteExists : Stop / End
    paused --> playing : Resume
    paused --> noteExists : Stop
            

Architecture

MVVM pattern with SwiftUI, @MainActor ViewModels, and specialized services.

System Overview

graph TD
    subgraph V["Views"]
        CV[ContentView] --> DV[DetailView]
        CV --> PG[PhotoGridView]
    end

    subgraph VM["ViewModels"]
        LVM[LibraryViewModel]
        VNVM[VoiceMemoViewModel]
    end

    subgraph S["Services"]
        ARS[AudioRecording]
        TS[Transcription]
        TRS[Translation]
        ILS[ImageLoading]
    end

    subgraph FS["File System"]
        M4A[".m4a"]
        TXT[".txt"]
        ENTXT[".en.txt"]
    end

    PG --> LVM
    DV --> VNVM
    LVM --> ILS
    VNVM --> ARS
    VNVM --> TS
    VNVM --> TRS
    ARS --> M4A
    TS --> TXT
    TRS --> ENTXT

    style V fill:#1a2a3a,stroke:#5e9eff,color:#e8e8e8
    style VM fill:#2a1a3a,stroke:#a05eff,color:#e8e8e8
    style S fill:#1a3a2a,stroke:#34d058,color:#e8e8e8
    style FS fill:#3a2a1a,stroke:#f0883e,color:#e8e8e8
            

Audio Pipeline

flowchart TD
    MIC["Microphone"] --> PCM["PCM 44.1kHz .caf"]
    PCM --> AAC["AAC .m4a"]
    AAC --> PLAY["Playback"]
    AAC --> SF["Speech Framework"]
    SF --> TXT["Transcript .txt"]
    TXT --> TR["Translation Framework"]
    TR --> EN["Translation .en.txt"]

    style MIC fill:#5c1a1a,stroke:#ff5e5e,color:#e8e8e8
    style AAC fill:#1a3a5c,stroke:#5e9eff,color:#e8e8e8
    style TXT fill:#1a5c2a,stroke:#34d058,color:#e8e8e8
    style EN fill:#3a2a1a,stroke:#f0883e,color:#e8e8e8
            

Tech Stack

Native macOS technologies β€” zero external dependencies.

SwiftUI
Declarative UI Framework
AVFoundation
Audio Recording & Playback
Speech Framework
On-Device Transcription
Translation
Local Translation DE→EN
CGImageSource
RAW Thumbnail Extraction
App Sandbox
Security-Scoped Bookmarks
XcodeGen
Project Generation via YAML
Hardened Runtime
Security Hardening

File Structure β€” Sidecar Pattern

File Path Description
Photo /Folder/photo.jpg Original image file
Voice Memo /Folder/.voicememos/photo.m4a AAC audio (converted from PCM)
Transcript /Folder/.voicememos/photo.txt Speech-to-text result
Translation /Folder/.voicememos/photo.en.txt English translation