A Python script to convert WordPress XML exports and media files into a clean, static site generator-agnostic Markdown structure.
Find a file
dosch 7d736f10cb Initial commit: WordPress to Markdown migration tool
- Add wordpress-to-markdown.py script for converting WP exports
- Add comprehensive README with usage instructions
- Add .gitignore for Python and test files
2025-10-02 08:54:44 +02:00
.gitignore Initial commit: WordPress to Markdown migration tool 2025-10-02 08:54:44 +02:00
README.md Initial commit: WordPress to Markdown migration tool 2025-10-02 08:54:44 +02:00
wordpress-to-markdown.py Initial commit: WordPress to Markdown migration tool 2025-10-02 08:54:44 +02:00

WordPress to Markdown Migration Tool

A Python script to convert WordPress XML exports and media files into a clean, static site generator-agnostic Markdown structure.

Features

  • Converts WordPress posts and pages to Markdown with YAML frontmatter
  • Copies and organizes all media files (images, audio, video, documents)
  • URL-independent (works regardless of original domain)
  • Preserves metadata (categories, tags, authors, dates)
  • Converts HTML content to Markdown
  • Compatible with Hugo, MkDocs, Jekyll, Gatsby, and other static site generators
  • Generates metadata.json for reference

Requirements

  • Python 3.6 or higher (no external dependencies required)
  • WordPress XML export file
  • wp-content folder (optional, but recommended for media files)

Installation

  1. Clone or download this repository
  2. Make the script executable (optional):
    chmod +x wordpress-to-markdown.py
    

Getting Your WordPress Data

Step 1: Export XML from WordPress

  1. Log into your WordPress admin panel
  2. Go to Tools → Export
  3. Select All content
  4. Click Download Export File
  5. Save the XML file (usually named like wordpress.YYYY-MM-DD.xml)

Step 2: Get wp-content Folder

Option A: Via FTP/SFTP

  1. Connect to your WordPress hosting via FTP/SFTP
  2. Navigate to your WordPress installation directory
  3. Download the entire wp-content folder

Option B: Via Hosting Control Panel

  1. Log into your hosting control panel (cPanel, Plesk, etc.)
  2. Use File Manager to locate wp-content
  3. Compress and download it

Option C: Via SSH

# On your server
cd /path/to/wordpress
tar -czf wp-content.tar.gz wp-content/
# Then download wp-content.tar.gz

Usage

Basic Usage

python3 wordpress-to-markdown.py \
  --xml wordpress.2024-01-15.xml \
  --content wp-content \
  --output my-site

Parameters

  • --xml (required): Path to WordPress XML export file
  • --content (optional): Path to wp-content directory (for media files)
  • --output (required): Directory where converted files will be saved

Example

# Full migration with media files
python3 wordpress-to-markdown.py \
  --xml ~/Downloads/wordpress.xml \
  --content ~/Downloads/wp-content \
  --output ~/Sites/my-migrated-site

# Posts/pages only (no media)
python3 wordpress-to-markdown.py \
  --xml ~/Downloads/wordpress.xml \
  --output ~/Sites/my-migrated-site

Output Structure

The script creates the following structure:

my-site/
├── content/
│   ├── posts/
│   │   ├── 2024-01-15-first-post.md
│   │   ├── 2024-02-20-second-post.md
│   │   └── ...
│   └── pages/
│       ├── about.md
│       ├── contact.md
│       └── ...
├── media/
│   ├── images/
│   │   └── 2024/01/photo.jpg
│   ├── audio/
│   │   └── 2024/02/podcast.mp3
│   ├── video/
│   │   └── 2024/03/demo.mp4
│   ├── documents/
│   │   └── 2024/01/guide.pdf
│   └── other/
│       └── ...
└── metadata.json

File Naming Convention

Posts: YYYY-MM-DD-slug.md (e.g., 2024-01-15-hello-world.md) Pages: slug.md (e.g., about.md)

Markdown Frontmatter

Each markdown file includes YAML frontmatter with metadata:

---
title: My First Post
date: 2024-01-15T10:30:00
author: John Doe
excerpt: A brief description of the post
categories:
  - Technology
  - Web Development
tags:
  - WordPress
  - Migration
  - Markdown
---

Post content here...

Using with Static Site Generators

The output is compatible with most static site generators. You may need to adjust configuration based on your chosen tool:

Hugo

# Create new Hugo site
hugo new site mysite
cd mysite

# Copy content
cp -r ../my-site/content/* content/
cp -r ../my-site/media static/

# Adjust config.toml as needed
hugo server

MkDocs

# Create new MkDocs project
mkdocs new mysite
cd mysite

# Copy content to docs folder
cp -r ../my-site/content/posts/* docs/posts/
cp -r ../my-site/content/pages/* docs/
cp -r ../my-site/media docs/

# Update mkdocs.yml configuration
mkdocs serve

Jekyll

# Create new Jekyll site
jekyll new mysite
cd mysite

# Copy posts
cp -r ../my-site/content/posts/* _posts/
cp -r ../my-site/media assets/

# Jekyll might require frontmatter adjustments
jekyll serve

Media File Handling

Supported File Types

  • Images: .jpg, .jpeg, .png, .gif, .webp, .svg, .ico, .bmp
  • Audio: .mp3, .wav, .ogg, .m4a, .flac, .aac
  • Video: .mp4, .mov, .avi, .wmv, .flv, .webm, .mkv
  • Documents: .pdf, .doc, .docx, .xls, .xlsx, .ppt, .pptx, .txt, .zip

URL Replacement

The script automatically replaces WordPress media URLs with relative paths:

Before: https://example.com/wp-content/uploads/2024/01/image.jpg After: ../../media/images/2024/01/image.jpg

This makes the content portable and independent of the original domain.

Metadata File

The metadata.json file contains:

{
  "site_title": "My WordPress Site",
  "site_url": "https://example.com",
  "migration_date": "2024-01-15T12:00:00",
  "authors": { ... },
  "categories": [ ... ],
  "tags": [ ... ],
  "posts": [ ... ],
  "pages": [ ... ]
}

Use this for reference or to build custom tools.

Troubleshooting

"XML file not found"

  • Verify the path to your XML export file is correct
  • Use absolute paths if relative paths don't work

"wp-content directory not found"

  • Ensure you've downloaded the complete wp-content folder
  • You can skip this parameter if you only want posts/pages

Missing media files

  • Check that media files exist in wp-content/uploads/
  • Some WordPress installations may store media elsewhere

Content not converting properly

  • The script handles common HTML, but complex shortcodes may need manual review
  • Check the output files and adjust as needed

Special characters in filenames

  • The script creates URL-safe slugs automatically
  • Original titles are preserved in frontmatter

Post-Migration Checklist

  • Review converted markdown files for formatting issues
  • Check that media files are correctly linked
  • Verify dates and metadata are accurate
  • Test with your chosen static site generator
  • Update internal links if needed
  • Review and clean up content (as planned)
  • Set up redirects if maintaining same URLs
  • Test locally before deploying

Advanced Usage

Custom Media Organization

Edit the media_extensions dictionary in the script to customize file type categorization.

Filtering Content

To export only specific posts or pages, modify the XML file before processing or filter the output afterward.

Batch Processing

Process multiple WordPress exports:

for xml in exports/*.xml; do
  name=$(basename "$xml" .xml)
  python3 wordpress-to-markdown.py \
    --xml "$xml" \
    --content wp-content \
    --output "sites/$name"
done

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

This script is provided as-is for personal and commercial use.

Support

For issues or questions:

  1. Check the Troubleshooting section
  2. Review the output metadata.json for clues
  3. Examine the generated markdown files
  4. Open an issue on the repository

Credits

Created to simplify WordPress migrations to modern static site generators while maintaining content integrity and media file organization.