Voice memos and automatic transcription for your photos β right on your Mac.
Apple Silicon · macOS 15+ · SwiftUI · On-Device AI
Everything you need to document photos with your voice.
Audio hardware is pre-warmed β recording starts without delay after a 1-second countdown.
Automatic speech-to-text using Apple's Speech Framework. Supports German and English.
German transcriptions are optionally translated to English β completely local on-device.
Supports NEF, RAF, ORF, DNG and all common image formats. Fast thumbnails via CGImageSource.
No import needed β just pick a folder. Meomos are stored as sidecar files alongside your photos.
Runs exclusively on Apple Silicon. No cloud, no external APIs. Sandboxed with Hardened Runtime.
The complete flow from folder selection to finished transcription.
flowchart TD
A["π Select Folder"] --> B["πΌοΈ Load Photo Grid"]
B --> C["π Select Photo"]
C --> D{"Voice memo\nexists?"}
D -- No --> E["ποΈ Start Recording"]
D -- Yes --> K["βΆοΈ Playback / ποΈ Delete"]
E --> F["β±οΈ 1s Countdown\n+ Hardware Warmup"]
F --> G["π΄ Recording\n+ Waveform Display"]
G --> H["βΉοΈ Stop"]
H --> I["πΎ PCM β AAC\nConversion"]
I --> J["π Transcription\n(Speech Framework)"]
J --> L{"Translation\nenabled?"}
L -- Yes --> M["π DE β EN\nTranslation"]
L -- No --> N["β
Done"]
M --> N
K --> C
style A fill:#1a3a5c,stroke:#5e9eff,color:#e8e8e8
style G fill:#5c1a1a,stroke:#ff5e5e,color:#e8e8e8
style N fill:#1a5c2a,stroke:#34d058,color:#e8e8e8
stateDiagram-v2
[*] --> noNote
noNote --> countingDown : Record
countingDown --> recording : Countdown = 0
recording --> converting : Stop
recording --> noNote : Cancel
converting --> noteExists : AAC saved
noteExists --> playing : Play
noteExists --> countingDown : Re-Record
noteExists --> noNote : Delete
playing --> paused : Pause
playing --> noteExists : Stop / End
paused --> playing : Resume
paused --> noteExists : Stop
MVVM pattern with SwiftUI, @MainActor ViewModels, and specialized services.
graph TD
subgraph V["Views"]
CV[ContentView] --> DV[DetailView]
CV --> PG[PhotoGridView]
end
subgraph VM["ViewModels"]
LVM[LibraryViewModel]
VNVM[VoiceMemoViewModel]
end
subgraph S["Services"]
ARS[AudioRecording]
TS[Transcription]
TRS[Translation]
ILS[ImageLoading]
end
subgraph FS["File System"]
M4A[".m4a"]
TXT[".txt"]
ENTXT[".en.txt"]
end
PG --> LVM
DV --> VNVM
LVM --> ILS
VNVM --> ARS
VNVM --> TS
VNVM --> TRS
ARS --> M4A
TS --> TXT
TRS --> ENTXT
style V fill:#1a2a3a,stroke:#5e9eff,color:#e8e8e8
style VM fill:#2a1a3a,stroke:#a05eff,color:#e8e8e8
style S fill:#1a3a2a,stroke:#34d058,color:#e8e8e8
style FS fill:#3a2a1a,stroke:#f0883e,color:#e8e8e8
flowchart TD
MIC["Microphone"] --> PCM["PCM 44.1kHz .caf"]
PCM --> AAC["AAC .m4a"]
AAC --> PLAY["Playback"]
AAC --> SF["Speech Framework"]
SF --> TXT["Transcript .txt"]
TXT --> TR["Translation Framework"]
TR --> EN["Translation .en.txt"]
style MIC fill:#5c1a1a,stroke:#ff5e5e,color:#e8e8e8
style AAC fill:#1a3a5c,stroke:#5e9eff,color:#e8e8e8
style TXT fill:#1a5c2a,stroke:#34d058,color:#e8e8e8
style EN fill:#3a2a1a,stroke:#f0883e,color:#e8e8e8
Native macOS technologies β zero external dependencies.
| File | Path | Description |
|---|---|---|
| Photo | /Folder/photo.jpg |
Original image file |
| Voice Memo | /Folder/.voicememos/photo.m4a |
AAC audio (converted from PCM) |
| Transcript | /Folder/.voicememos/photo.txt |
Speech-to-text result |
| Translation | /Folder/.voicememos/photo.en.txt |
English translation |