I have a problem about the HTML content exported.
1. I have read from the What's new, it says that RV support multiple charset for exporting, may I know what is the multiple charset mean?
2. As the charset of the first style item will be used as the encoding charset for HTML, since delphi only has a limited list of charset provided in the charset conbo box, can RV support all kind of charset?
3. If the rvsoUtf8 is added to the HTML save options, the exported content is in utf8 charset. But if the rvsoutf8 is not included, the exported html content is something like this "本系列" for non english character. May I know what format is this? What is the charset used?
Problem about HTML content
-
- Site Admin
- Posts: 17566
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
As far as I know, there are two possible ways to create multilanguage HTML files:
1) Using Unicode encoding, such as UTF-8. This is the best way, and it's supported by TRichView
2) Using some charset, and characters that are not included in this charset are encoded by their codes (such as Ӓ). That may produce very large HTML file. There are no way to create HTML file containing multiple charsets without Unicode or these codes. This way is supported by TRichView only partially (what's why implementation of UTF-8 in HTML was very important). TRichView saves non-Unicode text to HTML as it is, without any conversion. If text has different charasets, it may produce flawed HTML files. For Unicode text, all non-English characters are saved as their codes (&#NNNN;). If your document is Unicode (and it must be Unicode, if you want to support Asian languages), it must be saved correctly in any charset, but HTML file will be very large.
1) Using Unicode encoding, such as UTF-8. This is the best way, and it's supported by TRichView
2) Using some charset, and characters that are not included in this charset are encoded by their codes (such as Ӓ). That may produce very large HTML file. There are no way to create HTML file containing multiple charsets without Unicode or these codes. This way is supported by TRichView only partially (what's why implementation of UTF-8 in HTML was very important). TRichView saves non-Unicode text to HTML as it is, without any conversion. If text has different charasets, it may produce flawed HTML files. For Unicode text, all non-English characters are saved as their codes (&#NNNN;). If your document is Unicode (and it must be Unicode, if you want to support Asian languages), it must be saved correctly in any charset, but HTML file will be very large.