Pasting from Word to a CMS


“Why can’t I paste from Word?”

This is a question which comes up surprisingly often from our clients, so I thought a little explanation would be useful. The issue arises when a client writes the content for their site in Word, then uses copy-and-paste to transfer the document into our CMS, to add it to their site. All the copy comes through ok, but it often looks wrong, with the wrong spacing, wrong fonts and wrong font-sizes.

Small disclaimer: this issue doesn’t just affect the Black Square CMS – it affects every CMS that uses a WYSIWYG content editor to allow the client to format the copy themselves.

The issue arises, essentially, because we’re trying to use a tool in a manner other than that intended by the creator. To explain: the copy-and-paste tool was originally designed to copy only text, and was the same tool for every application. It could therefore copy any text from any one application to any other, and it would perform the same job for all. As the needs of the computer-using public grew, however, the copy-and-paste function needed to evolve to be able to copy more than just text. When using a file manager (eg: Windows Explorer), for example, it is very useful to be able to copy-and-paste whole files. When using a photo editing suite (eg: Photoshop), it is very useful to be able to copy-and-paste images, or layers or part of images.

In short, the copy-and-paste function now allows each application to define for itself exactly what gets copied. The copy-and-paste action, however, is not restricted to the same application. You can copy something in one application, and attempt to paste it into any other application. If the target application can process the type of information on the clipboard, the paste will work.

Formatting in Word

Now, a Word document contains more than just text. That text is carefully formatted with specific fonts, sizes, colours and spacing. When the designers of Word looked at the copy-and-paste function, they realised that, when a Word user wants to copy text from one document to another, they want to keep the formatting of the original text as well as just the words. They therefore came up with a way to encode all the formatting and the words together on the clipboard, so the target application would be able to use it all, and the way they chose to do this was using HTML.

HTML is the perfect choice. It is an international standard, and is specifically designed to do just this. HTML stands for HyperText Markup Language, and is designed to encode the markup (fonts, sizes, colours, spacing, etc) of any piece of text. In addition, HTML is a subset of XML, and Microsoft has decided that, in order to make its applications more universal, they should move towards the extensive use of XML for encoding their documents.

WYSISYG Editors

WYSIWYG content editors in CMSs also work with HTML – it is the language used to encode webpages. The problem is that the editor is trying to create a small part of the code for the page, rather than the whole thing. The page itself will define a whole range of standard styles which will apply to all copy across the site. Unless, of course, that copy specifically defines its own style. So, the editor has no problem with pasting simple text, and it would also have no problem if you pasted HTML that defined headings, italics and bold text, and simple formatting like that.

Pasting from Word, however, doesn’t paste just the simple stuff. For every line, it pastes exactly what font, size, colour and spacing it requires. This overrides the global settings in your site, and makes the copy on the site look (or at least try to look) exactly like it did in the Word document.

Solutions?

The solution to this problem that leaps first to mind is simple; make the style of your Word document the same as the website. Unfortunately, this solution has several problems. First off, you are requiring that every person who writes copy for the site has a Word template exactly like the website. You’ll have to create one of these, and distribute it to every writer, then ensure that they all stick to it. The other problem is the killer – if you ever want to change the style of the website, even slightly, you’re in for a nightmare job. You’ll need to re-format all the Word documents containing the copy of your site, then re-copy-and-paste each page to overwrite the existing formatting. That doesn’t sound like fun at all.

The only other practical solution, however, is to paste just the text, without any of the formatting, then apply the formatting directly in the editor. Advanced WYSIWYG editors (including the TinyMCE editor used in the Black Square CMS) will have a ‘paste as plain text’ function, which allows you to do this easily. Otherwise, you need to open a simple text editor (eg: Notepad), paste the copy from Word into there, then copy that text, and paste it into your editor.

It would be really easy to blame Microsoft for this failure, but unfortunately, this time, the blame does not lie with them. The designers of Word designed the copy-and-paste feature to work with Word, and it does that exactly as required. We are trying to use this feature outside of its intended application and, while it is tantalisingly close to working well, unfortunately it doesn’t quite fit our needs.

Comments are closed.