In the past, when I moved blogs to different platforms, I lost blog photos because I didn’t make an effort to back them up or transfer them. Today, I decided to establish a process for grabbing all of the images that I use in this blog, just in case I ever need to have a copy.
I frequently backup the blog itself as an XML file using the built-in export tool. However, this only backs up the written content and architecture – it doesn’t backup the media files.
There are several different programs that could be used and probably various methods as well. In my case, I’m using a WordPress.com blog with a custom domain. In addition, the backup process is done using a Mac.
To scrape the blog content I chose to use SiteSucker, which is a donationware software program for OS X. To reduce the data that is backed up down to a minimal set that includes the images, I changed some of the default settings.
First, you’ll need to enter the address of the blog:
Finally, go to the Paths section and under Paths to Include add two URLs. One should be the address for the blog and the other should be the path where uploaded blog images are stored at WordPress.com. You’ll need to change username to match your blog. Typically, it’s the same as the username for the account that was used to create the blog.
In most cases you should be ready to click OK to save changes and then hit Download to begin grabbing the files. When the process has completed you’ll end up with two folders for each address in the destination folder. One will represent content from the blog address. You can delete this folder. The other will include all of your images sorted by the date they were added in a post. The full size images should be present along with copies that match the sizes of the images as they appeared in your posts.
These steps assume that your images are stored in your WordPress.com account. If not, you may need to add additional paths.
This information can be used to grab any files you hosted via WordPress.com. In this example SiteSucker was configured to only grab image files, but you can use different File Types options to grab other content such as PDFs.
Note that regardless of the settings that you choose for file types, SiteSucker will always download HTML and CSS files. However, those files should only be included in the folder for the blog address – you probably won’t have to remove any from the folders that include images/files.