Skip to Content

Gareth53.co.uk - the online home of Gareth Senior

Converting A Word Doc Into A Kindle E-Book

5:49p.m., Thu 19 May 2011

Last week Emma decided to publish her novel "Unfamous" as an e-book on Amazon's Kindle store. Converting the book into the Kindle format could have been a real pain, but it proved to be a relatively straight-forward and fun little exercise. It led to a lot of messing around with the tech - reading between the lines of the documentation and various online guides, some guess work and quite a lot of just 'suck-and-see'.

This blog post is a collection of the things I learned. No real shocking revelations, just some observations and a few 'gotchas'. If you're reading this now, I hope it's useful. It's form my viewpoint as a developer, so it assumes a knowledge of HTML (Hypertext Markup Lanaguage) and CSS (Cascading Stylesheets).

Converting Your Book into an Amazon Kindle-friendly Format (i.e. HTML)

Amazon claim to accept a variety of formats of book:

  • Word Documents (.doc)
  • ePub (.epub)
  • Plain Text Files (.txt)
  • MobiPocket (.mobi or .prc)
  • Adobe PDF (.pdf)

But what it all actually boils down to is that you have to convert these files to to the Mobi format via HTML. If the HTML requires external resources like CSS and images, a zipped file can be converted to mobi format.

The '.prc' format is "Palm Resource Code" - a file type that's been most commonly used developing for Palm Pilot PDAs.

Converting from Word to Kindle Format

In our case we were converting from Word. This means converting to HTML using Word's "Save as HTML" feature and then using the free software Mobi Pocket Creator to convert HTML to .prc format.

This will surely make any self-respecting developer shudder - somewhere a puppy is crying. Word includes all kinds of proprietary and extraneous markup when you do this - tags that wrap words it thinks are typos as well as strings it thinks are dates/times and place-names. It also makes liberal use of inline styles. It's your common or garden tag soup of the day.

A handful of formatting will survive the conversion: indentations (spaces and/or tabs), bold characters, italics and headings as well as page breaks.

Some notable stuff that WON'T survive the conversion includes bullet points (but Kindle can display bulleted lists) special fonts, headers and footers.

About MobiPocket Creator

MobiPocket Creator is a free download from mobipocket.com that will convert HTML files into the eBook .prc format. It's only available for Windows.

It also features some useful functionality that will automatically create a table of contents, front page and book cover.

The Kindle Previewer Application

The Kindle Previewer is a tool provided by Amazon which emulates how a book displays on the Kindle device.

Kindle's HTML Support

Using some find-and-replace magic I cleaned the Microsoft-making-puppies-cry HTML. Emma's main gripe was that paragraphs were ALWAYS indented and she wanted the first paragraph after each heading to start inline with the left-edge of the headings. I was pretty sure this was achievable with some simple CSS.

The kindle supports a very small subset of HTML, The supported tags are also only partially supported - with only a subset of the valid attributes allowed.

HTML TagNotes
<a> Hyperlinked text
<b> Bold
<big> Increases font size by one point.
<br /> Line break
<cite> Inline version of blockquote - by default formatted as italic.
<del> Indicates deleted text; enclosed text is displayed with a linethrough
<dfn> Used to indicate term being defined for the first time in the text. Formats enclosed text as italics.
<em> Emphasizes the enclosed text; generally formatted as italic.
<font> Determines the appearance of the enclosed text.
<i> Italic
<img /> Image, supported attributes: align, border, height, src, width.
<s> Strikethrough
<small> Reduces font size of enclosed text to one point smaller than the current font size.
<span> Applies defined attributes to in-line text.
<strike> Formats text as strikethrough. See also, .
<strong> Formats enclosed text as bold. See also, .
<sub> Subscript - reduces the font size and drops it below the baseline.
<sup> Superscript - reduces the font size and places it above the baseline.
<tt> Truetype
<u> Underlined.
<var> Indicates a variable name. By default, text is formatted as italic.
<blockquote> Quoted text - used for a block of text.
<center> Centres text horizontally
<code> Denotes programming code
<dl>, <dt> and <dd> Definition list, definition term and defintion
<div> Defines a division or section of a document
<h1> to <h6> Section heading: <h1> (largest) through to <h6> (smallest)
<hr /> Creates a horizontal "rule" or line.
<ol>, <ul> and <li> Ordered list (ol), unordered list (ul) and list items (li)
<p> Paragraph of text, by default the first line is indented

Kindle also recognizes the basic html, head and body tags as well as base, style, title, meta and script.

The big omission here is the table (and all allowed child elements). Amazon recommends that tabular information be converted into an image and included that way.

In terms of supported attributes, the id attribute is allowed on all tags, and the class attribute on most, if not all, elements.

Proprietary Amazon Markup

There's also a handful of proprietary Amazon Kindle tags...

TagNotes
<mbp:pagebreak /> Forces a page break.
<mbp:nu> Formats text as "not underlined", over-riding <u> tags and <font> tag attributes
<mbp:section> Defines a book 'section'

Character Set Support

Amazon Kindle Direct Publishing supports text in the 'Latin-1' (ISO-8859-1) character set with a handful of exceptions - the 'suit' characters: spades, clubs, hearts (although it will display a diamond), the up-arrow, down-arrow and the Greek letters: alpha, beta and gamma.

Kindle's CSS Support

When transferring a document to the Kindle, as with HTML, only a small subset of CSS is supported, and an even smaller subset is useful.

Supported CSS Properties and Values

PropertyValues
font-size xx-small, x-small, small, medium, large, x-large or xx-large
font-style normal, italic or oblique
font-weight normal or bold
vertical-align sub or super
text-align left, right, center, or justify
text-decoration underline, line-through or none
text-indent px value or %age

About Fonts

Note that you have some control over FONTs using either the HTML <FONT> tag and/or the CSS font-size (strictly relative sizes), font-style (for italics) and font-weight (for bolding). You cannot control the actual font used - the out-of-the-box Kindle has only two fonts available: "Caecilia" (a serif font) and "Neue Helvetica" (a non-serif) - and you can't chose between these for your ebook which will use the serif-font Caecilia. The non-serif font is reserved for other purposes.

Default Styling

The default font-sizes seem to be fine - font-sizes are all relative and, as you'd expect headings are generally more prominent than standard paragraphs - all headings are bolded.

The most annoying thing about the default styling is that all block-level elements (apart from headers) appear to be indented by a handful of character's-worth of space - paragraphs, lists, blockquotes, everything. Clearing this means setting text-indent equal to 0.

Here's a screengrab of the preview of simple h1-h6 heading tags - no CSS applied.

a demo of how the Kindle renders h1 to h6 tags

A simple unordered (bulleted) list

a demo of how the Kindle renders an HTML unordered list

"Sub" and "super" tags.

a demo of how the Kindle renders sub and super tags

 

Experimenting With CSS Selectors

This, for me, was the most interesting area of the Kindle/MobiCreator technologies. I coded the CSS by hand and was intrigued to see how powerful the CSS engine was. Not very, as it turns out.

Emma wanted the first paragraph after each heading to be NOT indented. Using CSS as detailed by the specification you can achieve this by using a sibling selector:

h2 + p {
    text-indent: 0;
}

Most modern browsers (Internet Explorer 9, Google Chrome, Firefox & Opera) support this. Did this work in the Kindle? Not a chance! Not that I suspected it would... but it led me to experiment with selectors.

 

Supported CSS Selectors

What it comes down to is that you're limited to the following selectors:

Type selectors

For example...

h1 {
    font-size: small;
}

will style all <h1> tags as described in the CSS specification. Fine. What won't work is child selectors like this....

 

h1 span {     font-size: small;
}

I found that declarations like this started affecting all h1 tags, regardless of whether a span element was present. A declaration like this over-rode any earlier CSS using the h1 as a selector.

ID Selectors

For example...

h1#main {
    font-size: small
}

...will style all <h1> tags with the id="main" as described in the CSS specification. This is the most straight-forward, bullet-proof way of applying a CSS style.

Class Selectors

For example....

p.first {
    text-indent: 0
}

Will style all paragraph elements with the class attribute value set to "first".

I'd recommend that you stick to a single class per element though, I discovered some weirdness when using multiple classes.

All other CSS selectors, as far as I can see, are NOT supported. No pseudo selectors, sibling selectors or attribute selectors :(

Here's a full table of support:

Selector Example Description Kindle Support
Universal Selector * Matches any element NO SUPPORT
Element Selector h1 Matches any H1 element Supported
Pseudo Selector p:first-child Matches any <P> that is the first child of its parent NO SUPPORT
Descendant Selector h1 span Matches any SPAN that is a child of a H1 Buggy (to the point of being useless)
Child Selector p > span Matches any SPAN that is a DIRECT child of a P (but not grand children) No SUPPORT
Sibling Selector h1 + p Matches any P that is preceded by a h1 (i.e. the first paragraph after a h1) No SUPPORT
Attribute Selector #1 h1[class] Matches any h1 that has a class attribute (regardless of the value) No SUPPORT
Attribute Selector #2 h1[lang="fr"] Matches any h1 that has a lang attribute with a value of "fr" No SUPPORT
Class Selector h1.scientific Matches any h1 that has a class attribute with a value of "scientific" Buggy (but usable)
ID Selector h1#scientific Matches any h1 that has the ID attribute with a value of "scientific" Supported

In the interests of full disclosure, here's the markup I tested along with a screengrab of the results.

I didn't expect many of these selectors to be supported but it would have been a pleasant surprise. Given that most ebooks have very basic layout and formatting the HTML and CSS support is adequate - certainly for the vast majority of Amazon's best-selling titles, which is largely fiction.

Removing The Default Indentation

Just to revisit the problem I mentioned earlier about removing the indentation from each FIRST paragraph. This was solved by manually adding a class to the first paragraph of each chapter. Give that the book only has around ten or eleven chapters, this wasn't such a hardship. The CSS code in the head of the document looks like this:

p.first {
    text-indent: 0;
}

And each relevant paragraph looks like so:

Part 2: 'Posh Prison Break'

So I wake up, and it's the morning after the night before....[snip]


 

 

 

When Using Microsoft Word: Tip for Authors

I have one tip I would impart to writers who are using Microsoft Word with a view to publishing to the Kindle later on. It's very easy to select fonts, bolding and italicising manually throughout. For body text, this is fine, you can fall into the trap of doing this for chapter headings too but it'll save you some pain later on if you use the style formatting to signify "Heading 1", "Heading 2", "Heading 3" stuff instead. This is imbuing the document with meaning rather than just visual style information. You can change the appearance of 'Heading 1' and 'Heading 2' later on and this will affect each use case. When you come to export your docuemtn and convert it to other formats your document will retain this useful information saving a lot of hand-cranking of your data.

Use 'style' formatting - it'll save you pain later on....

If you're trying to format a book for the Kindle, I hope you found this blogpost useful. And good luck!

Related Links

About "Unfamous" by Emma Morgan

A b-list celebrity, addled after weeks of partying throws herself in the Thames. Recovering in a London hospital she suffers amnesia and commits herself to rehab in an attempt to rediscover her true self. As she pieces together her past she realises her whole life has been a lie and the reality is much more exciting and glamorous and wealthy that she could ever have wished for. The book charts her battle to prove her birthright.

It's available to buy now for the bargain price of 69p - about the same as THREE copies of heat magazine and infinitely more entertaining. It's enjoyable and funny. At least four five people have said they liked it.

Latest Posts

Blog Categories