- Add wordpress-to-markdown.py script for converting WP exports - Add comprehensive README with usage instructions - Add .gitignore for Python and test files |
||
|---|---|---|
| .gitignore | ||
| README.md | ||
| wordpress-to-markdown.py | ||
WordPress to Markdown Migration Tool
A Python script to convert WordPress XML exports and media files into a clean, static site generator-agnostic Markdown structure.
Features
- ✅ Converts WordPress posts and pages to Markdown with YAML frontmatter
- ✅ Copies and organizes all media files (images, audio, video, documents)
- ✅ URL-independent (works regardless of original domain)
- ✅ Preserves metadata (categories, tags, authors, dates)
- ✅ Converts HTML content to Markdown
- ✅ Compatible with Hugo, MkDocs, Jekyll, Gatsby, and other static site generators
- ✅ Generates metadata.json for reference
Requirements
- Python 3.6 or higher (no external dependencies required)
- WordPress XML export file
- wp-content folder (optional, but recommended for media files)
Installation
- Clone or download this repository
- Make the script executable (optional):
chmod +x wordpress-to-markdown.py
Getting Your WordPress Data
Step 1: Export XML from WordPress
- Log into your WordPress admin panel
- Go to Tools → Export
- Select All content
- Click Download Export File
- Save the XML file (usually named like
wordpress.YYYY-MM-DD.xml)
Step 2: Get wp-content Folder
Option A: Via FTP/SFTP
- Connect to your WordPress hosting via FTP/SFTP
- Navigate to your WordPress installation directory
- Download the entire
wp-contentfolder
Option B: Via Hosting Control Panel
- Log into your hosting control panel (cPanel, Plesk, etc.)
- Use File Manager to locate
wp-content - Compress and download it
Option C: Via SSH
# On your server
cd /path/to/wordpress
tar -czf wp-content.tar.gz wp-content/
# Then download wp-content.tar.gz
Usage
Basic Usage
python3 wordpress-to-markdown.py \
--xml wordpress.2024-01-15.xml \
--content wp-content \
--output my-site
Parameters
--xml(required): Path to WordPress XML export file--content(optional): Path to wp-content directory (for media files)--output(required): Directory where converted files will be saved
Example
# Full migration with media files
python3 wordpress-to-markdown.py \
--xml ~/Downloads/wordpress.xml \
--content ~/Downloads/wp-content \
--output ~/Sites/my-migrated-site
# Posts/pages only (no media)
python3 wordpress-to-markdown.py \
--xml ~/Downloads/wordpress.xml \
--output ~/Sites/my-migrated-site
Output Structure
The script creates the following structure:
my-site/
├── content/
│ ├── posts/
│ │ ├── 2024-01-15-first-post.md
│ │ ├── 2024-02-20-second-post.md
│ │ └── ...
│ └── pages/
│ ├── about.md
│ ├── contact.md
│ └── ...
├── media/
│ ├── images/
│ │ └── 2024/01/photo.jpg
│ ├── audio/
│ │ └── 2024/02/podcast.mp3
│ ├── video/
│ │ └── 2024/03/demo.mp4
│ ├── documents/
│ │ └── 2024/01/guide.pdf
│ └── other/
│ └── ...
└── metadata.json
File Naming Convention
Posts: YYYY-MM-DD-slug.md (e.g., 2024-01-15-hello-world.md)
Pages: slug.md (e.g., about.md)
Markdown Frontmatter
Each markdown file includes YAML frontmatter with metadata:
---
title: My First Post
date: 2024-01-15T10:30:00
author: John Doe
excerpt: A brief description of the post
categories:
- Technology
- Web Development
tags:
- WordPress
- Migration
- Markdown
---
Post content here...
Using with Static Site Generators
The output is compatible with most static site generators. You may need to adjust configuration based on your chosen tool:
Hugo
# Create new Hugo site
hugo new site mysite
cd mysite
# Copy content
cp -r ../my-site/content/* content/
cp -r ../my-site/media static/
# Adjust config.toml as needed
hugo server
MkDocs
# Create new MkDocs project
mkdocs new mysite
cd mysite
# Copy content to docs folder
cp -r ../my-site/content/posts/* docs/posts/
cp -r ../my-site/content/pages/* docs/
cp -r ../my-site/media docs/
# Update mkdocs.yml configuration
mkdocs serve
Jekyll
# Create new Jekyll site
jekyll new mysite
cd mysite
# Copy posts
cp -r ../my-site/content/posts/* _posts/
cp -r ../my-site/media assets/
# Jekyll might require frontmatter adjustments
jekyll serve
Media File Handling
Supported File Types
- Images:
.jpg,.jpeg,.png,.gif,.webp,.svg,.ico,.bmp - Audio:
.mp3,.wav,.ogg,.m4a,.flac,.aac - Video:
.mp4,.mov,.avi,.wmv,.flv,.webm,.mkv - Documents:
.pdf,.doc,.docx,.xls,.xlsx,.ppt,.pptx,.txt,.zip
URL Replacement
The script automatically replaces WordPress media URLs with relative paths:
Before: https://example.com/wp-content/uploads/2024/01/image.jpg
After: ../../media/images/2024/01/image.jpg
This makes the content portable and independent of the original domain.
Metadata File
The metadata.json file contains:
{
"site_title": "My WordPress Site",
"site_url": "https://example.com",
"migration_date": "2024-01-15T12:00:00",
"authors": { ... },
"categories": [ ... ],
"tags": [ ... ],
"posts": [ ... ],
"pages": [ ... ]
}
Use this for reference or to build custom tools.
Troubleshooting
"XML file not found"
- Verify the path to your XML export file is correct
- Use absolute paths if relative paths don't work
"wp-content directory not found"
- Ensure you've downloaded the complete wp-content folder
- You can skip this parameter if you only want posts/pages
Missing media files
- Check that media files exist in
wp-content/uploads/ - Some WordPress installations may store media elsewhere
Content not converting properly
- The script handles common HTML, but complex shortcodes may need manual review
- Check the output files and adjust as needed
Special characters in filenames
- The script creates URL-safe slugs automatically
- Original titles are preserved in frontmatter
Post-Migration Checklist
- Review converted markdown files for formatting issues
- Check that media files are correctly linked
- Verify dates and metadata are accurate
- Test with your chosen static site generator
- Update internal links if needed
- Review and clean up content (as planned)
- Set up redirects if maintaining same URLs
- Test locally before deploying
Advanced Usage
Custom Media Organization
Edit the media_extensions dictionary in the script to customize file type categorization.
Filtering Content
To export only specific posts or pages, modify the XML file before processing or filter the output afterward.
Batch Processing
Process multiple WordPress exports:
for xml in exports/*.xml; do
name=$(basename "$xml" .xml)
python3 wordpress-to-markdown.py \
--xml "$xml" \
--content wp-content \
--output "sites/$name"
done
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
License
This script is provided as-is for personal and commercial use.
Support
For issues or questions:
- Check the Troubleshooting section
- Review the output metadata.json for clues
- Examine the generated markdown files
- Open an issue on the repository
Credits
Created to simplify WordPress migrations to modern static site generators while maintaining content integrity and media file organization.