TinyMCE – Paste from Word; Paste as Plain Text

TinyMCE – Paste from Word; Paste as Plain Text

Kurt Rademaekers's picture

TinyMCE provides two tools to facilitate cut & paste of text from a Microsoft Word document or from an existing web page into a new web page being created with the Drupal content management system.

In TinyMCE 3.4.9, the “Paste from Word” feature will preserve basic Word document formatting while removing special Word-specific code that transparently accompanies a paste from Word which typically messes up the display of a web page.  The “Paste as Plain Text” feature, like Windows Notepad or Mac TextEdit, will strip out all formatting that accompanies content from a Word document or web page.

Symptoms of Non-Cleaned-Up Content from Word

If a Drupal page is misformatting, one of the first things to determine is if the contents have been improperly pasted in from Word.

When you paste directly from Word into a TinyMCE editor that has not been configured to anticipate this action, the Word content is—unbeknownst to the casual user—wrapped in a complex set of HTML containing XML code, CSS class references and style attributes.

Since it’s the job of a good HTML editor to shield you from HTML code, TinyMCE hides the complex Word HTML and displays the content as best it can—deceptively cleanly, most often.  TinyMCE does not warn you that you’ve just pasted complex Word-created HTML into the CMS*, and the initial results may look just fine. (*This would be a user-friendly enhancement to TinyMCE.)

The bad news isn’t evident until someone attempts to view that page with a different browser and the page is totally misformatted or appears blank.  Ironically, this latter scenario happens most often when the page is viewed in Microsoft Internet Explorer.

Tip: If the content of a basic page displays improperly, view the page source with your browser, or in Drupal open the page in edit mode to examine the text area HTML source.  If the features are enabled, use the TinyMCE “Edit HTML Source” toolbar button or the “Disable Input” link feature to view the HTML source.

If the HTML source looks anything like the following, the content has been pasted directly from Word with no clean-up. Note the "o:" and "w:".

<!--[if gte mso 9]><xml><br>
  </xml><![endif]--><!--[if gte mso  9]><xml><br>

If the content has been cleaned up and converted properly, the results should look more like this:

<h1>Quality Assurance Testing Checklist – Back  End</h1>
<h2>Content Publisher Use Cases<br />Testing on  Firefox on MAC</h2>
<table border="1" cellspacing="0"  cellpadding="0">
<td valign="top" width="437"><br>
<p>Use Case</p>

“Paste from Word” and “Force cleanup on standard paste”

There are two different ways to enable the Paste from Word function in TinyMCE.  For the TinyMCE profile associated with the Drupal input format do one of these:

  1. In Buttons and Plugins, check “Paste from Word” to enable that button on the TinyMCE toolbar.  The user can then copy the text from Word, click the “Paste from Word” tool, paste the content into the popup window and click Save.  The text is then stripped of the special Word XML and CSS code and pretty cleanly pasted into the text editor as standardized HTML, although it may add extra spacing between paragraphs.
  2. In Cleanup and Output, check “Force cleanup on standard paste”.  With this enabled, TinyMCE assumes that any direct paste into the text area may be content from Word, and so it always evaluates it for clean-up in the same way as the "Paste from Word" tool above.

With option 2, I have not encountered any conflicts with other TinyMCE functions, so I recommend 2 in place of option 1. Forcing cleanup on standard paste relieves users of having to remember that they’re pasting from Word, and it automatically and transparently handles that scenario.

A simple Word document will typically transfer well.  TinyMCE will eliminate a lot of special formatting and will attempt to apply basic HTML tags such as H2 and H3 tags, which Word has probably styled via CSS.  The result—with little or no effort—is a document whose font styles are consistent with the rest of your site.

As noted above, when pasting from a simple Word document, paragraph spacing may be doubled from the original and you may lose text color attributes, so these may require some manual clean-up in the post-paste content.

On the positive side, I find that TinyMCE typically transfers bolding, italics, underlining, strike-throughs and tables properly.

When It Doesn’t Go So Smoothly

A complex or improperly formatted Word document is unlikely to transfer well when using TinyMCE “Paste from Word”.

Sometimes it’s a struggle with Word to get your document to look that way you'd like.  If you’ve inserted your own bullet point characters (rather than using a Word-standard way of doing that) or manually aligned content using spaces, tabs and inserted line breaks, when you paste your Word document into TinyMCE it’s unlikely to format as it did in Word. 

A page on your website is unlikely to provide the same content width as the Word document, so manual line breaks won’t be positioned properly.  HTML reduces a series of space characters to one space, so indentation created with spaces will go away.  You may find that font styles transfer when you don’t want them to or vice versa.

There are two basic approaches to this situation.

  1. Let the TinyMCE "Paste from Word" function do what it will, and manually clean-up document formatting with the CMS text editor (TinyMCE). 
  2. Paste the Word contents into Windows Notepad or Mac TextEdit, which will eliminate all font styling.  Copy all from Notepad or TextEdit and paste into the CMS text area, then manually apply the desired formatting.

Tip: With a given document, try 1 then 2.  See which method gives you a result that will require the least amount of manual reformatting.

"Paste as Plain Text"

The TinyMCE “Paste as Plain Text” function lets you paste content from a Word document or web page with the same effect as pasting into Windows Notepad or Mac TextEdit.  Pasting with this function strips out all hidden HTML, XML and CSS—and therefore all font styling and document formatting except line breaks—from either content source and leaves you with plain, unformatted text.

I find this is an under-documented and non-intuitive feature, but useful nonetheless.

To enable “Paste as Plain Text”:

  1. In Cleanup and Output, check “Force cleanup on standard paste”**. 
  2. In Buttons and Plugins, check “Paste text” to enable that button on the TinyMCE toolbar.

**In TinyMCE 3.4.9, I found I had to enable this setting despite the accompanying instruction that says it makes “the default paste function (CTRL-V or SHIFT-INS) behave like the 'paste from word' plugin function”.

To use “Paste as Plain Text”, copy content from a Word document or web page and click the “Paste as Plain Text” tool. Notice how the button appears in a "pushed in" state.

The first time you click the "Paste as Plain Text" button (and only the first time) you’ll see this message: “Paste is now in plain text mode.  Click again to toggle back to regular paste mode.  After you paste something you will be returned to regular paste mode.” Got it?

Interpretation of these vague terms:

  • If the button is pushed in it’s in “plain text mode”, and special formatting from a Word document or web page will get stripped out when you paste into the text area. 
  • If the button is out it’s in “regular paste mode”, which means all the extraneous HTML format code you don't want will get pasted in along with the text.

Configuration Recommendations

For most installations I believe the most user-friendly configurations are:

  1. Enable “Force cleanup on standard paste”.  This screens users from having to remember that they’ve pasted from Word, so it automatically and transparently handles that issue.
  2. Don’t enable “Paste from Word”.  On top of “Force cleanup on standard paste”, it’s redundant.  By itself it requires an additional step and an often-crossed margin of error.
  3. Enable “Paste as Plain Text”.  You’ll know how to use it, and you’ll be able to explain it to your users or co-workers.

Post new comment