Converting Word Files to ePub Files
Jan 3rd, 2010 by Sarah
In an earlier post, I mentioned that as far as the Rymellan 1: Disobedience Means Death eBook goes, I’ll be starting with Smashwords. Why Smashwords? A few reasons:
- Its terms are generous.
- It doesn’t require exclusivity, so you’re free to offer your eBooks elsewhere, in addition to Smashwords
- It’s very friendly to independent publishers
- It distributes your eBook to several online partners
That last reason is important. Through Smashwords, you can sell your eBooks at Barnes & Noble, Amazon (it’s not easy for independent Canadian publishers to publish to the Kindle directly), Kobo (the rebranded Shortcovers, Chapters-Indigo’s online store), and others.
If you format your Word file in accordance with the Smashwords style guide, its “meatgrinder” will take your Word file and grind out an eBook in numerous formats, including ePub and Mobi. Currently there’s no way to upload already formatted eBook files.
So it’s sort of a trade-off right now. You can get your eBook into several important online venues, but probably not formatted as well as it could be if you’d done it yourself. And if you decide that you want to offer your eBook at another venue, the file you uploaded to Smashwords probably won’t be acceptable. You’ll have to format it yet again in accordance with whatever that venue wants.
Right now, eBook formats are a mess. Too many. We need standardization, and ePub looks like it might win the battle. With that in mind, I spent most of today teaching myself how to create ePub files. It’s fairly straightforward. If you’re familiar with HTML and know how to zip files, it’s easy. I thought I’d share how I plan to do it.
Okay, so I experimented with The Dance, a story at Rymellan Fiction. I started out with a Word 2003 compatible file. How did I take that to an ePub file? Well, I searched for free tools that will convert Word files to ePub files, hoping someone else had already automated the conversion. I came across this handy list:
How can I create ePub files from my books?
To make a long story short, I successfully created an ePub file (and a Mobi file, the format the Kindle uses) using . . . drum roll, please . . . eCub, #9 on the list. The runner-up is EPubGen, the ePub converter available at Google (#6).
So, download, install, and run eCub, and you’ll be prompted to create a project and supply some metadata about your book (title, author, etc.). It will also want to know what file you want to convert, but unfortunately it doesn’t convert Word files. It only converts text or XHTML files. Since I have XHTML files for The Dance, I first thought of those, but they contain a bunch of stuff I wouldn’t want in an eBook (links like Contact, etc.). So I went with a text file.
You can save a Word document as a text file using Save As from within Word. One important note—make sure you specify the encoding as UTF-8, which isn’t the default. Use the text file as the source file for the eCub conversion. Press eCub’s Compile button, and boom, you have an ePub file. It’s that easy. Um, not really! You see, when you saved your Word file as a text file, all your formatting (text alignment, italics, etc.) was removed. Yikes!
So how do you get it back? Well, the hard way, I’m afraid. This is where a familiarity with HTML and CSS will serve you well. When you imported your text file, eCub created an HTML file from it. In fact, an ePub file is a zip file that contains a bunch of HTML files describing your book, and, of course, its contents (XHTML files, specifically). You have to edit the HTML files and the CSS file that contains styling information to get back your formatting. For example, to bring back italics, I had to define a style for it in the CSS file and then edit the HTML file containing the text of The Dance. There may be other style elements you want to add back related to scene breaks, chapter beginnings, etc.
When you do this editing, make sure you edit the source files for the conversion. Don’t edit the files under the build directory, since those files are regenerated every time you compile. Read through eCub’s Help. It’s, well, helpful!
If you’re converting a large file, like a novel or a short story collection, you should break the file into smaller files. Ideally, each chapter or story should go into its own text file. It’s easy to import a set of files into eCub and then tell it to generate an ePub file from them. Having individual files will allow the tool to generate a decent table of contents, and eReaders will load each chapter and story faster than they would when dealing with one humongous file.
If your book contains complex elements like tables and images, you’ll have more work to do and perhaps using eCub won’t work for you. If your stuff is plain text with minimal formatting, this method works and isn’t all that time consuming.
The generated ePub file looks fine in the Sony eReader software I have installed, except for the cover, which is cut off. You supply the cover file separately to eCub. I mentioned that eCub will also create Mobi files. I tried that too, and the cover looks fine in MobiReader, so I’ll point the finger at the Sony software here, though it might not really be to blame. Apparently ePub is still grappling with how to define covers. I found this article about Best Practices in ePub Cover Images, but I haven’t tried the described method yet.
A few words about the runner up: the converter at Google, called EPubGen, will take a Word file, convert it to ePub, and maintain all the formatting. Not only that, it’s easy to do. You run the software, drag the Word file you want to convert onto it, watch an image of the Word file jump around for about 5 seconds, and then drag the resulting ePub file off the tool into a directory. Wow! Well, yes, but hang on a minute.
First, there isn’t an option to include a cover, though you can easily include one manually yourself. Second, when I loaded the generated ePub file into the Sony reader software and flipped through it, I occasionally came across arbitrary page breaks. I double-checked the Word file and couldn’t see any formatting errors there, so I unzipped the ePub file and took a look at the HTML files. It turns out that the tool arbitrarily chunks up the Word file into numerous HTML files. It looks ugly and might confuse readers. Unlike with eCub, you can’t import a set of files into EPubGen and then create an ePub file from them. You could fix the problem by editing the HTML files, but you’ll also have to create the table of contents, if you want one. I think EPubGen might be better than eCub for small files, meaning a few pages in length. For anything longer, it’s not the best choice.
The final potential drawback to EPubGen is that it only works with docx files, meaning it won’t convert files from Word 2003 and earlier. It only handles Word 2007 files (and later versions, if there are any?). The upside is that it preserves your formatting. So I suppose you could use a combination of eCub and EPubGen, to try to save time, but I think using eCub and adding back the formatting will ultimately be less painful. The eCub tool also has numerous settings that allow you to control the style of the generated ePub file. Both tools are under active development, so the wrinkles in both could eventually be worked out.
In conclusion, once I knew what I was doing, it didn’t take me all that long to produce a nicely formatted version of The Dance, in ePub and Mobi formats, using eCub. So that’s my planned route. The Dance is done, and I figure that if I do a story a day, I’ll have an ePub version of Rymellan 1: Disobedience Means Death in about a week’s time.
If the thought of creating ePub files scares you, there are several companies that will do it for you, but you’ll have to pay for the conversion. Unless you’re a real technophobe or don’t have the time, learning how to do it yourself will be better in the long run. I’ll also mention a tool that will take a PDF file and convert it to an ePub file, so you could convert the PDF file you used for your print book. However, the tool, called pdftoepub, is $149 (not sure in what currency; it’s an Australian company), so you’ll have to decide if it’s worth it to you. I don’t know how well it works, since I haven’t used it.
If you’re going to use Smashwords, I guess another method could be to obtain the ePub file generated by its meatgrinder and pretty that up, but I don’t think you’ll be saving yourself much time over the other methods I’ve discussed. You have to strip out a lot of formatting when creating the source file for meatgrinder, so you’d still have work to do with the ePub file. But it might be a good starting point.
I’ll mention one other tool that’s useful when editing HTML files: Notepad++. You’ll appreciate it when you’re mucking around in HTML files, adding back your formatting. And if you like to do everything manually, here are instructions for creating ePub files by hand.
So time well spent today. I now understand the ePub format, and I’m glad that I’ll be able to produce ePub files myself and not have to pay someone to create them for me. I think I’ll take the rest of the day off!