Script to convert MP3 audio files to text transcriptions. Optimized for Apple Silicon chips (M1/M2/M3)
Find a file
2025-10-16 11:51:35 +02:00
.claude Code and language sanitations. Config file added 2025-10-16 11:51:35 +02:00
audio_files Initial commit: Batch MP3 to Text Transcription Tool 2025-10-15 14:51:43 +02:00
.gitignore Initial commit: Batch MP3 to Text Transcription Tool 2025-10-15 14:51:43 +02:00
launch_transcription.command Code and language sanitations. Config file added 2025-10-16 11:51:35 +02:00
LICENSE Initial commit: Batch MP3 to Text Transcription Tool 2025-10-15 14:51:43 +02:00
README.md Code and language sanitations. Config file added 2025-10-16 11:51:35 +02:00
transcribe.conf Code and language sanitations. Config file added 2025-10-16 11:51:35 +02:00
transcribe_workflow.sh Code and language sanitations. Config file added 2025-10-16 11:51:35 +02:00

Batch MP3 to Text

Script to convert MP3 audio files to text transcriptions. Optimized for Apple Silicon chips (M1/M2/M3).

Features

  • 🚀 Batch processing - Transcribe multiple MP3 files automatically
  • 🎯 Apple Silicon optimized - Uses MLX-Whisper for 8-10x faster processing on M1/M2/M3 chips
  • 🎚️ Audio preprocessing - Optional noise reduction and normalization
  • 📊 Real-time progress - Live progress bars during transcription
  • 🧠 Smart memory management - Handles large files with automatic memory cleanup
  • 🌍 Multi-language support - Supports 90+ languages including English, Dutch, French, German, Spanish, and more
  • ⚙️ Easy configuration - External config file for customization without editing scripts

Requirements

  • macOS with Apple Silicon (M1/M2/M3)
  • Python 3.9+
  • ffmpeg - For audio processing
  • mlx-whisper - ML model for transcription

Installation

1. Install Homebrew (if not installed)

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

2. Install dependencies

# Install ffmpeg
brew install ffmpeg

# Install Python 3.11
brew install python@3.11

# Install MLX-Whisper
pip3 install mlx-whisper

3. Download this script

git clone https://github.com/YOUR_USERNAME/batch-mp3-to-text.git
cd batch-mp3-to-text
chmod +x transcribe_workflow.sh

Quick Start

Option 1: GUI Launcher (Easiest for macOS)

  1. Place your MP3 files in the audio_files/ directory
  2. Double-click launch_transcription.command
  3. The launcher will:
    • Show you which files will be transcribed
    • Wait for you to press ENTER to start
    • Ask you to choose a preprocessing level (1, 2, or 3)
    • Run the transcription automatically
    • Show results when complete
    • Your Mac will not sleep during transcription (uses caffeinate)

Option 2: Command Line

  1. Place your MP3 files in the audio_files/ directory
  2. Run the script:
./transcribe_workflow.sh
  1. Choose your preprocessing level when prompted (1, 2, or 3)
  2. Wait for transcriptions to complete
  3. Find your transcripts in transcriptions/

To prevent your Mac from sleeping during long transcriptions: (this is the default setting anyway)

caffeinate -i ./transcribe_workflow.sh

Custom Input Directory

./transcribe_workflow.sh /path/to/your/audio/files

Configuration

Configuration File: transcribe.conf

The script uses an external configuration file (transcribe.conf) that allows you to customize all settings without modifying the script itself. This file is automatically loaded when the script runs.

What's in the config file:

  • 🎚️ Preprocessing levels (none/basic/enhanced)
  • 🤖 Whisper model selection (tiny/base/small/medium/large)
  • 🌍 Language settings (90+ languages supported)
  • 📏 File size limits and chunking behavior
  • ⚙️ Feature toggles (skip existing, keep originals, diarization)
  • 🎛️ Advanced audio settings (sample rate, bitrate, filters)
  • 📊 Output preferences (combined transcripts, naming, formats)

Variable precedence: Environment variables > Config file > Script defaults

This means you can override config file settings by setting environment variables when running the script.

  1. The config file is already included (transcribe.conf)

  2. Edit the config file:

    nano transcribe.conf
    # or
    open -e transcribe.conf  # Opens in TextEdit on macOS
    
  3. Customize your settings:

    # Example: Change language to English
    LANGUAGE="en"
    
    # Example: Use enhanced preprocessing for noisy audio
    PREPROCESS_LEVEL="enhanced"
    
    # Example: Use the large model for best accuracy
    WHISPER_MODEL="mlx-community/whisper-large-v3-mlx"
    
  4. Run the script normally - it will automatically load your config

The config file includes 280+ lines of detailed comments explaining each setting, all available options, and what each change means for performance and quality.

Choosing Preprocessing Level

When you run the script, you'll be asked to choose a preprocessing level:

Choose audio preprocessing level:
  1) None      - No preprocessing (fastest)
  2) A little  - Mono + normalization + basic filtering (recommended)
  3) A lot     - Advanced noise reduction (best for noisy audio)

Enter your choice (1, 2, or 3):

The preprocessing level affects audio quality before transcription:

  • Option 1 (None) - No processing, fastest
  • Option 2 (A little) - Recommended for most files
  • Option 3 (A lot) - Best for noisy recordings, slower

To set a default level without being prompted, edit transcribe_workflow.sh:

  1. Open transcribe_workflow.sh in a text editor
  2. Find the line: PREPROCESS_LEVEL="basic"
  3. Change to your preferred level (none, basic, or enhanced)
  4. Save the file

Model

This script uses the MLX-Whisper Medium model by default, which provides:

  • Excellent accuracy for transcription in 90+ languages
  • ~10x realtime processing on Apple Silicon (M1/M2/M3)
  • Memory usage: ~5-6GB RAM
  • Optimized for all languages including English, Dutch, French, German, Spanish, etc.
  • Easily configurable - switch to other models (tiny/base/small/large) via config file

Directory Structure

After running, your directory will contain:

batch-mp3-to-text/
├── transcribe_workflow.sh       # Main script
├── launch_transcription.command # GUI launcher for macOS (double-click to run)
├── transcribe.conf              # Configuration file with detailed settings and comments
├── audio_files/                 # Input: Your MP3 files
├── transcriptions/              # Output: Text transcripts
│   ├── combined_transcript_*.txt
│   └── *_transcript.txt
├── preprocessed_audio/          # Processed audio files
├── temp_chunks/                 # Temporary chunks (auto-cleaned)
└── transcription_log_*.txt      # Detailed logs

Preprocessing Levels

None

  • No audio processing
  • Use original files as-is
  • Fastest option
  • Convert to mono
  • Normalize volume
  • Remove rumble (highpass filter)
  • Remove hiss (lowpass filter)
  • Best balance of speed and quality

Enhanced

  • All Basic features
  • Advanced noise reduction
  • Best for noisy recordings
  • Slower processing

Memory Management Tips (8GB Systems)

If you're running on an 8GB M1/M2 Mac, follow these tips for optimal performance:

Before Starting Transcription

1. Free up RAM by closing memory-heavy apps:

# Close browsers (biggest memory hogs)
killall "Google Chrome" Safari Firefox

# Close development tools
killall "Visual Studio Code" Xcode

# Close communication apps
killall Slack Discord Teams

2. Clear memory cache:

sudo purge  # Requires password, but frees up inactive memory

3. Check available memory:

vm_stat | grep "Pages free" | awk '{print int($3 * 4096 / 1048576) " MB free"}'

You want at least 5GB free before starting.

Memory Requirements

The Medium model requires:

  • Minimum: 5GB free RAM
  • Recommended: 6GB+ free RAM for optimal performance

For 8GB systems: Close all unnecessary applications before running to ensure 5-6GB is available.

Overnight Processing

For batch processing multiple files overnight:

Using GUI Launcher (Easiest):

  1. Restart your Mac (clears all memory)
  2. Close all applications
  3. Double-click launch_transcription.command
  4. Keep Mac plugged in and leave it running
    • Caffeinate is already enabled (prevents sleep automatically)

Using Command Line:

# 1. Restart your Mac (clears all memory)
# 2. Close all applications
# 3. Run with caffeinate to prevent sleep:
caffeinate -i ./transcribe_workflow.sh

# Keep Mac plugged in and leave it running

Monitor Memory During Processing

Open a second terminal to watch memory usage:

watch -n 5 'vm_stat | grep "Pages free" | awk "{print int(\$3 * 4096 / 1048576) \" MB free\"}"; sysctl vm.swapusage'

Warning signs:

  • Free memory < 500MB → May slow down
  • Swap usage > 3GB → Close more applications and restart

Quick Memory Check Commands

# See memory hogs
ps aux | sort -nrk 4 | head -10

# Free up memory
sudo purge

# Prevent sleep during processing
caffeinate -i ./transcribe_workflow.sh

Troubleshooting

Progress bar not showing

The progress bar displays in real-time during transcription. If you don't see it, the transcription is still running - check the log file for details.

"Fetching 4 files" message

This is normal - it's loading the AI model components (only takes a second).

Out of memory errors

  • Close other applications (browsers, IDEs, etc.)
  • Run sudo purge to free up inactive memory
  • Restart your Mac for a fresh start
  • Process files one at a time instead of in batch

Script hangs

  • Check transcription_log_*.txt for errors
  • Verify MLX-Whisper is installed: mlx_whisper --version
  • Ensure Python 3.9+: python3 --version

Permission denied

Make the script executable:

chmod +x transcribe_workflow.sh

Performance

On Apple Silicon M1/M2/M3:

  • Processing speed: ~10x realtime (60 minutes of audio = ~6 minutes processing)
  • Example: A 1-hour recording transcribes in approximately 6 minutes
  • MLX-Whisper is 8-10x faster than whisper.cpp on Apple Silicon

Language Support

Default: Dutch (nl) - This script is optimized for Dutch recordings

Why Language Setting Matters for Quality

⚠️ Setting the correct language significantly improves transcription accuracy!

Benefits of specifying the language:

  • Better Accuracy - Uses language-specific vocabulary and grammar models
  • Faster Processing - Skips auto-detection, saving time
  • Correct Spelling - Dutch words like "bijvoorbeeld", "misschien", "waarschijnlijk" transcribed properly
  • Context Understanding - Recognizes language-specific patterns and expressions
  • Proper Nouns - Dutch names and places recognized correctly

Example:

  • Without language setting: "I can for build" (misheard English)
  • With Dutch setting: "ik kan bijvoorbeeld" (correct Dutch)

Changing the Language

Option 1: Interactive (GUI Launcher) When using launch_transcription.command, you'll be prompted to select:

  • Dutch (Nederlands) - Default
  • English
  • Other (use config file setting)

Option 2: Configuration File Edit transcribe.conf:

LANGUAGE="nl"    # Dutch (DEFAULT - optimized for this script)
# LANGUAGE="en"  # English
# LANGUAGE="fr"  # French
# LANGUAGE="de"  # German
# LANGUAGE="es"  # Spanish
# ... and 85+ more languages!

Supported Languages

90+ languages supported including:

  • Western European: English (en), Dutch (nl), French (fr), German (de), Spanish (es), Italian (it), Portuguese (pt)
  • Eastern European: Polish (pl), Russian (ru), Ukrainian (uk), Czech (cs), Romanian (ro)
  • Asian: Japanese (ja), Chinese (zh), Korean (ko), Hindi (hi), Vietnamese (vi), Thai (th)
  • Middle Eastern: Arabic (ar), Turkish (tr), Hebrew (he), Persian (fa)
  • Nordic: Swedish (sv), Norwegian (no), Danish (da), Finnish (fi)
  • And 60+ more languages!

See Whisper documentation for the complete list.

Logs

Detailed logs are saved to transcription_log_TIMESTAMP.txt including:

  • System information
  • Memory usage
  • Processing speeds
  • Errors and warnings
  • File locations

Security

  • No data is sent to external servers
  • All processing happens locally on your Mac
  • Models are downloaded once and cached in ~/.cache/huggingface/

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

GNU General Public License v3.0 - see LICENSE file for details.

Author

Douwe

Acknowledgments

Version History

  • 1.1.0 - Code quality improvements

    • Security: Fixed eval vulnerability in log_command function
    • ⚙️ Configuration: Added external transcribe.conf file for easy customization
    • 📝 Readability: Split large functions into smaller, focused sub-functions
    • 🌍 Internationalization: Standardized all UI text to English International
    • 🇳🇱 Language Optimization: Defaults to Dutch transcription with interactive language selection in GUI launcher
    • 📖 Documentation: Enhanced README explaining why language setting matters for quality
    • 🔧 Maintainability: Improved code structure and comments
  • 1.0.0 - Initial release

    • Batch MP3 transcription
    • Apple Silicon optimization
    • Audio preprocessing
    • Real-time progress display