Script to convert MP3 audio files to text transcriptions. Optimized for Apple Silicon chips (M1/M2/M3)

Find a file

dosch 9f08b209da Code and language sanitations. Config file added		2025-10-16 11:51:35 +02:00
.claude	Code and language sanitations. Config file added	2025-10-16 11:51:35 +02:00
audio_files	Initial commit: Batch MP3 to Text Transcription Tool	2025-10-15 14:51:43 +02:00
.gitignore	Initial commit: Batch MP3 to Text Transcription Tool	2025-10-15 14:51:43 +02:00
launch_transcription.command	Code and language sanitations. Config file added	2025-10-16 11:51:35 +02:00
LICENSE	Initial commit: Batch MP3 to Text Transcription Tool	2025-10-15 14:51:43 +02:00
README.md	Code and language sanitations. Config file added	2025-10-16 11:51:35 +02:00
transcribe.conf	Code and language sanitations. Config file added	2025-10-16 11:51:35 +02:00
transcribe_workflow.sh	Code and language sanitations. Config file added	2025-10-16 11:51:35 +02:00

README.md

Batch MP3 to Text

Script to convert MP3 audio files to text transcriptions. Optimized for Apple Silicon chips (M1/M2/M3).

Features

🚀 Batch processing - Transcribe multiple MP3 files automatically
🎯 Apple Silicon optimized - Uses MLX-Whisper for 8-10x faster processing on M1/M2/M3 chips
🎚️ Audio preprocessing - Optional noise reduction and normalization
📊 Real-time progress - Live progress bars during transcription
🧠 Smart memory management - Handles large files with automatic memory cleanup
🌍 Multi-language support - Supports 90+ languages including English, Dutch, French, German, Spanish, and more
⚙️ Easy configuration - External config file for customization without editing scripts

Requirements

macOS with Apple Silicon (M1/M2/M3)
Python 3.9+
ffmpeg - For audio processing
mlx-whisper - ML model for transcription

Installation

1. Install Homebrew (if not installed)

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

2. Install dependencies

# Install ffmpeg
brew install ffmpeg

# Install Python 3.11
brew install python@3.11

# Install MLX-Whisper
pip3 install mlx-whisper

3. Download this script

git clone https://github.com/YOUR_USERNAME/batch-mp3-to-text.git
cd batch-mp3-to-text
chmod +x transcribe_workflow.sh

Quick Start

Option 1: GUI Launcher (Easiest for macOS)

Place your MP3 files in the audio_files/ directory
Double-click launch_transcription.command
The launcher will:
- Show you which files will be transcribed
- Wait for you to press ENTER to start
- Ask you to choose a preprocessing level (1, 2, or 3)
- Run the transcription automatically
- Show results when complete
- Your Mac will not sleep during transcription (uses caffeinate)

Option 2: Command Line

Place your MP3 files in the audio_files/ directory
Run the script:

./transcribe_workflow.sh

Choose your preprocessing level when prompted (1, 2, or 3)
Wait for transcriptions to complete
Find your transcripts in transcriptions/

To prevent your Mac from sleeping during long transcriptions: (this is the default setting anyway)

caffeinate -i ./transcribe_workflow.sh

Custom Input Directory

./transcribe_workflow.sh /path/to/your/audio/files

Configuration

Configuration File: `transcribe.conf`

The script uses an external configuration file (transcribe.conf) that allows you to customize all settings without modifying the script itself. This file is automatically loaded when the script runs.

What's in the config file:

🎚️ Preprocessing levels (none/basic/enhanced)
🤖 Whisper model selection (tiny/base/small/medium/large)
🌍 Language settings (90+ languages supported)
📏 File size limits and chunking behavior
⚙️ Feature toggles (skip existing, keep originals, diarization)
🎛️ Advanced audio settings (sample rate, bitrate, filters)
📊 Output preferences (combined transcripts, naming, formats)

Variable precedence: Environment variables > Config file > Script defaults

This means you can override config file settings by setting environment variables when running the script.

Quick Start: Using the Config File (Recommended)

The config file is already included (transcribe.conf)

Edit the config file:

nano transcribe.conf
# or
open -e transcribe.conf  # Opens in TextEdit on macOS

Customize your settings:

# Example: Change language to English
LANGUAGE="en"

# Example: Use enhanced preprocessing for noisy audio
PREPROCESS_LEVEL="enhanced"

# Example: Use the large model for best accuracy
WHISPER_MODEL="mlx-community/whisper-large-v3-mlx"

Run the script normally - it will automatically load your config

The config file includes 280+ lines of detailed comments explaining each setting, all available options, and what each change means for performance and quality.

Choosing Preprocessing Level

When you run the script, you'll be asked to choose a preprocessing level:

Choose audio preprocessing level:
  1) None      - No preprocessing (fastest)
  2) A little  - Mono + normalization + basic filtering (recommended)
  3) A lot     - Advanced noise reduction (best for noisy audio)

Enter your choice (1, 2, or 3):

The preprocessing level affects audio quality before transcription:

Option 1 (None) - No processing, fastest
Option 2 (A little) - Recommended for most files
Option 3 (A lot) - Best for noisy recordings, slower

To set a default level without being prompted, edit transcribe_workflow.sh:

Open transcribe_workflow.sh in a text editor
Find the line: PREPROCESS_LEVEL="basic"
Change to your preferred level (none, basic, or enhanced)
Save the file

Model

This script uses the MLX-Whisper Medium model by default, which provides:

Excellent accuracy for transcription in 90+ languages
~10x realtime processing on Apple Silicon (M1/M2/M3)
Memory usage: ~5-6GB RAM
Optimized for all languages including English, Dutch, French, German, Spanish, etc.
Easily configurable - switch to other models (tiny/base/small/large) via config file

Directory Structure

After running, your directory will contain:

batch-mp3-to-text/
├── transcribe_workflow.sh       # Main script
├── launch_transcription.command # GUI launcher for macOS (double-click to run)
├── transcribe.conf              # Configuration file with detailed settings and comments
├── audio_files/                 # Input: Your MP3 files
├── transcriptions/              # Output: Text transcripts
│   ├── combined_transcript_*.txt
│   └── *_transcript.txt
├── preprocessed_audio/          # Processed audio files
├── temp_chunks/                 # Temporary chunks (auto-cleaned)
└── transcription_log_*.txt      # Detailed logs

Preprocessing Levels

None

No audio processing
Use original files as-is
Fastest option

Basic (Recommended)

Convert to mono
Normalize volume
Remove rumble (highpass filter)
Remove hiss (lowpass filter)
Best balance of speed and quality

Enhanced

All Basic features
Advanced noise reduction
Best for noisy recordings
Slower processing

Memory Management Tips (8GB Systems)

If you're running on an 8GB M1/M2 Mac, follow these tips for optimal performance:

Before Starting Transcription

1. Free up RAM by closing memory-heavy apps:

# Close browsers (biggest memory hogs)
killall "Google Chrome" Safari Firefox

# Close development tools
killall "Visual Studio Code" Xcode

# Close communication apps
killall Slack Discord Teams

2. Clear memory cache:

sudo purge  # Requires password, but frees up inactive memory

3. Check available memory:

vm_stat | grep "Pages free" | awk '{print int($3 * 4096 / 1048576) " MB free"}'

You want at least 5GB free before starting.

Memory Requirements

The Medium model requires:

Minimum: 5GB free RAM
Recommended: 6GB+ free RAM for optimal performance

For 8GB systems: Close all unnecessary applications before running to ensure 5-6GB is available.

Overnight Processing

For batch processing multiple files overnight:

Using GUI Launcher (Easiest):

Restart your Mac (clears all memory)
Close all applications
Double-click launch_transcription.command
Keep Mac plugged in and leave it running
- ✅ Caffeinate is already enabled (prevents sleep automatically)

Using Command Line:

# 1. Restart your Mac (clears all memory)
# 2. Close all applications
# 3. Run with caffeinate to prevent sleep:
caffeinate -i ./transcribe_workflow.sh

# Keep Mac plugged in and leave it running

Monitor Memory During Processing

Open a second terminal to watch memory usage:

watch -n 5 'vm_stat | grep "Pages free" | awk "{print int(\$3 * 4096 / 1048576) \" MB free\"}"; sysctl vm.swapusage'

Warning signs:

Free memory < 500MB → May slow down
Swap usage > 3GB → Close more applications and restart

Quick Memory Check Commands

# See memory hogs
ps aux | sort -nrk 4 | head -10

# Free up memory
sudo purge

# Prevent sleep during processing
caffeinate -i ./transcribe_workflow.sh

Troubleshooting

Progress bar not showing

The progress bar displays in real-time during transcription. If you don't see it, the transcription is still running - check the log file for details.

"Fetching 4 files" message

This is normal - it's loading the AI model components (only takes a second).

Out of memory errors

Close other applications (browsers, IDEs, etc.)
Run sudo purge to free up inactive memory
Restart your Mac for a fresh start
Process files one at a time instead of in batch

Script hangs

Check transcription_log_*.txt for errors
Verify MLX-Whisper is installed: mlx_whisper --version
Ensure Python 3.9+: python3 --version

Permission denied

Make the script executable:

chmod +x transcribe_workflow.sh

Performance

On Apple Silicon M1/M2/M3:

Processing speed: ~10x realtime (60 minutes of audio = ~6 minutes processing)
Example: A 1-hour recording transcribes in approximately 6 minutes
MLX-Whisper is 8-10x faster than whisper.cpp on Apple Silicon

Language Support

Default: Dutch (nl) - This script is optimized for Dutch recordings

Why Language Setting Matters for Quality

⚠️ Setting the correct language significantly improves transcription accuracy!

Benefits of specifying the language:

✅ Better Accuracy - Uses language-specific vocabulary and grammar models
✅ Faster Processing - Skips auto-detection, saving time
✅ Correct Spelling - Dutch words like "bijvoorbeeld", "misschien", "waarschijnlijk" transcribed properly
✅ Context Understanding - Recognizes language-specific patterns and expressions
✅ Proper Nouns - Dutch names and places recognized correctly

Example:

❌ Without language setting: "I can for build" (misheard English)
✅ With Dutch setting: "ik kan bijvoorbeeld" (correct Dutch)

Changing the Language

Option 1: Interactive (GUI Launcher) When using launch_transcription.command, you'll be prompted to select:

Dutch (Nederlands) - Default
English
Other (use config file setting)

Option 2: Configuration File Edit transcribe.conf:

LANGUAGE="nl"    # Dutch (DEFAULT - optimized for this script)
# LANGUAGE="en"  # English
# LANGUAGE="fr"  # French
# LANGUAGE="de"  # German
# LANGUAGE="es"  # Spanish
# ... and 85+ more languages!

Supported Languages

90+ languages supported including:

Western European: English (en), Dutch (nl), French (fr), German (de), Spanish (es), Italian (it), Portuguese (pt)
Eastern European: Polish (pl), Russian (ru), Ukrainian (uk), Czech (cs), Romanian (ro)
Asian: Japanese (ja), Chinese (zh), Korean (ko), Hindi (hi), Vietnamese (vi), Thai (th)
Middle Eastern: Arabic (ar), Turkish (tr), Hebrew (he), Persian (fa)
Nordic: Swedish (sv), Norwegian (no), Danish (da), Finnish (fi)
And 60+ more languages!

See Whisper documentation for the complete list.

Logs

Detailed logs are saved to transcription_log_TIMESTAMP.txt including:

System information
Memory usage
Processing speeds
Errors and warnings
File locations

Security

No data is sent to external servers
All processing happens locally on your Mac
Models are downloaded once and cached in ~/.cache/huggingface/

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

GNU General Public License v3.0 - see LICENSE file for details.

Author

Douwe

Acknowledgments

MLX-Whisper - Fast Whisper implementation for Apple Silicon
OpenAI Whisper - Original Whisper model
ffmpeg - Audio processing

Version History

1.1.0 - Code quality improvements
- ✅ Security: Fixed eval vulnerability in log_command function
- ⚙️ Configuration: Added external transcribe.conf file for easy customization
- 📝 Readability: Split large functions into smaller, focused sub-functions
- 🌍 Internationalization: Standardized all UI text to English International
- 🇳🇱 Language Optimization: Defaults to Dutch transcription with interactive language selection in GUI launcher
- 📖 Documentation: Enhanced README explaining why language setting matters for quality
- 🔧 Maintainability: Improved code structure and comments
1.0.0 - Initial release
- Batch MP3 transcription
- Apple Silicon optimization
- Audio preprocessing
- Real-time progress display