Web Authoring Tips & Tools

First what everyone is looking for, the free tools. So here is one, a free UTF8 convertor tool. This I wrote using strictly WIN32 API calls. It should work on any version of Windows from Win98, Win NT4.0 and up (ME, 2000, XP). I developed this on an XP system, but WIN32 API is for all the listed OS's.

This is easy to use. It even gives the Windows character for your code page for each converted UTF8 value so you can just select, copy, paste these bytes into your html document. Why? Simple, most editors do NOT support UTF8. They typically display all bytes as characters from Windows character set for the current code page. But most current browsers can handle the UTF8 data and properly display the characters. (The meta content="UTF8" specifies this.)

So what is UNICODE? It is an international standard to represent the characters of all written languages. But to accomplish that you need more than 255 values (one byte). So along comes UTF8, a method for encoding these higher values into byte wise data such that the original values can be decoded back. And it neatly makes ASCII become … the bytes they were originally. But that is where things take two different paths. For the western European languages (latin alphabet based) the latin 8859-1 works well. It allows the values from 0x80 to 0x7F for the various letters with diacritical marks. UTF8 encodes these values into 2 bytes. For 8859-1 every byte is rendered, for UTF8 multi byte values are recognized and decoded (converted) to values corresponding to any other UNICODE character. UTF8 supports the middle eastern, Asian, etc. languages all in byte wise data. UNICODE does not.

How do you use this? Look up the desired character in the UNICODE charts and enter the value of this character in the convertor. Click the 'Convert' button and you get the hex byte values plus you get displayable characters for these bytes. (They will be some odd characters, but just copy and paste to your document.) There is a nice description chart of the UNICODE <<>> UTF8 mapping in Wikipedia. I based my convertor tool on this chart. And "UTF-8 stands for Unicode Transformation Format-8. It is an octet (8-bit) lossless encoding of Unicode characters."

A quick quiz. This page is UTF-8 and I have used one special character. Unless you are viewing this with a very old browser (say from mid 1990's or older) you will see the character normally. (For those stuck in the past you are seeing some garbage symbols in the page.) Those with current browsers try to find the character. Give up? It's two paragraphs up where I say "And it neatly makes ASCII "

A note on the convertor application. It appears that the ability to show printable (hence copyable) characters is due to the fact that the Windows character set for your code page has printable characters for most bytes from 0x80 to 0xFF, thus they can be copied. And UTF-8 has no characters for 0x80 through 0xA0, the values most used for multi byte UTF-8 characters!

Or to make it easy here is the convertor as a "Cloud Computing" aplication.

This page is a work in progress. So come back for more in the future.