A tool or an option needed

Latchezar Mintcheff · Post by **Latchezar Mintcheff** » Thu Feb 26, 2004 3:49 am

Congratulations!

Very good program. I recommend it to everyone I meet and talk to about the text processing.

But! I recently needed to convert a small part of a book that was (and still is) typeset in 256 symbol ASCII font to Unicode Times New Roman (or another Unicode font). I nave not found any option to do this in Atlantis Ocean Mind.

Isn't there sich an option or I simply do not see it?

That is very important to have such thing, having in mind that a lot of people have many documents in ASCII and in the same time use Unicode as well.

Greetings from Sofia

Latchezar Mintcheff
Latchezar Mintcheff Publishers

www.latchezarmintcheff.com

Post by **admin** » Thu Feb 26, 2004 9:14 am

Text cannot be "converted from a character set to a font". You can apply any available font to the selected text. But you cannot convert ASCII to Times New Roman. This operation is inappropriate. If you have a plain text document in Bulgarian, and you can read it in Atlantis, you could save it either as RTF or DOC. Other word processors would display it properly too.

arringtonve · Post by **arringtonve** » Fri Feb 27, 2004 9:35 am

Maybe this has nothing to do with the Unicode issue, but it appears that Atlantis is limited to the 256 character set. Sometimes, when preparing a document in Word or OpenOffice, I'll put in a true virgule instead of a slash (for example to create odd fractions such as 12/72), or an fl or fi ligature. These are available in Word and OO even though they're beyond the 256 character set.

When displayed in Atlantis, however, these characters often don't appear properly or appear as question marks.

Post by **admin** » Fri Feb 27, 2004 10:41 am

arringtonve wrote: Atlantis is limited to the 256 character set.

Atlantis is not limited to any character set. As a matter of fact, "the 256 character set" does not exist. There is a set of ANSI character sets intended for different groups of languages and alphabets.

Latchezar Mintcheff · Post by **Latchezar Mintcheff** » Fri Feb 27, 2004 3:24 pm

I know what is a character set and what is a font. I learned these things 25 years ago. Now it's may be too late to teach each other. It might beof use to you to read my short note, which is in at least 20 or 30 web and print reference materials. I wrote it apropos of the strange animal taht some people call Russian Cyrillic and even Russian Alphabet. Here it is:

The CHARACTER is a separate basic symbol of a given alphabet.

The ALPHABET is a system of characters, used for writing and shared by
certain group of people, nation, group of nations, countries etc. The alphabet may be Latin, Cyrillic, Greek etc.

The CHARACTER SET is a collection of alphabetical and other symbols that
satisfies a specific writing system.

The CHARACTER CODE is the machine/computer/program representation (coding) of a specific character or other writing symbol.

The CODEPAGE is a list of selected character codes in a certain order.

The CODE TABLE or CHARACTER TABLE is the table in which a particular
codepage codes and their respective characters (and other symbols) are
structured.

The codepage in most cases specifies:

a) the alphabet;
b) the character set;
c) the national (or some other) keyboard layout.

The KEYBOARD LAYOUT is the accordance of the keys of the keyboard with the order of some alphabet and/or with other elements of a specific writing system. The keyboard may be hardware or software defined - by the manufacturer or by a codepage.

That's why we have alphabets - Latin, Cyrillic, Greek etc. This is the
main thing. And, from other hand, we have codepages - ISO-8859-1 (Latin), Windows-1251 (Cyrillic), DOS-855 (Cyrillic Bulgarian), IBM-866 (Cyrillic Russian) etc. We use them to write on a specific keyboard in a specific language with a specific character set.

The matter is simple. In the 256 symbol fonts the symbols are at certain places and in the Unicode fonts are at other places. To adapt a document that is typeset in a 256 symbol font to a Unicode font we call conversion. We can do this with various scripts or commands.

My simple question was only if Atlantis Ocean Mind can do this. I understand that it can't and will not in the future, too.

You can get closer to this matter simply typing in any search site the words "ascii to/from unicode".

As people say, thank you for your attention.

Latchezar Mintcheff · Post by **Latchezar Mintcheff** » Fri Feb 27, 2004 3:36 pm

Yes, arringtonve, Atlantis Ocean mind does not support the Unicode fonts beyond its first part - 1252 (Windows) US (ANSI).

Try to switch to Cyrillic and type something in Arial for example to see what will happen. Nothing useful.

Regards.

Post by **admin** » Fri Feb 27, 2004 6:16 pm

Atlantis can be used for composing documents in any language (except the Far-East languages - Japanese and Chinese). Atlantis works in the same way as other word processors work (and as they worked many years ago).

When you type text in Atlantis (or in any other word processor), newly typed characters are automatically formatted with the code page (character set) associated with the current keyboard layout.

By default, Windows installs a single keyboard layout. This default keyboard layout matches the default language specified by user.

This is why when the US users type text in Atlantis, it is automatically marked with the Western character set, and with the "English (United States)" language.

Bulgarian users normally select "Bulgarian" as their default language of Windows. Consequently Windows installs the Bulgarian keyboard layout as a default layout. And when Bulgarian users type text in Atlantis, normally it is automatically marked with the Cyrillic character set, and with the "Bulgarian" language.

If a user wants to type text in different languages which use different character sets (in most cases, different alphabets), he/she installs additional keyboard layouts through the “Control Panel” of Windows. Windows offers a convenient mechanism for switching among the installed keyboard layouts (with the Alt+Shift keys, or through the system tray of the Windows Task bar). Many Bulgarian users also install the English keyboard layout for typing texts in English. When they launch Atlantis (or another word processor), they activate the corresponding keyboard layout before typing text. When they need to type text in Bulgarian, they activate the Bulgarian keyboard layout. Any typed letter gets displayed with the Cyrillic character set, and gets associated with the Bulgarian language. When they need to type text in English, they activate the English keyboard layout. Any typed letter gets displayed with the Western European character set, and gets associated with the English language.

This is how ALL word processors work (including MS Word). There is no other method for typing multilingual documents.

Robert · Post by **Robert** » Fri Feb 27, 2004 8:38 pm

Greetings--

Latchezar Mintcheff wrote: But! I recently needed to convert a small part of a book that was (and still is) typeset in 256 symbol ASCII font to Unicode Times New Roman (or another Unicode font). I nave not found any option to do this in Atlantis Ocean Mind.

1) I'd like to avoid launching into any kind of silly although trendy debate about the pros and cons of UNICODE vs NON-UNICODE coding.
Those of you interested in accurate and technical descriptions of characters, glyphs, character sets, ASCII codes, Code Pages, UNICODE, and the like, will find all relevant information on the Microsoft site at the following addresses:

http://www.microsoft.com/typography/unicode/cscp.htm

http://www.microsoft.com/typography/unicode/cs.htm

2) On the practical matter of language support in Atlantis or in any other word processor, things are currently very simple.
Unless you want to use or create documents in the Chinese, Japanese or Korean languages, you do not need UNICODE at all. I repeat, you do NOT need UNICODE at all.
All languages except the Chinese, Japanese or Korean languages can be written and displayed on a computer using the standard ANSI Code Pages.

3) As I understand things, and to remain practical, your "problem" document was written in Cyrillic. Cyrillic is covered by the traditional Single-Byte character sets (SBCS) of the ANSI Standard.
If it originally displayed correctly, then your document was written using the appropriate ANSI character set. There is no need to use UNICODE to display your document at all.

4) You write that your document was "typeset in 256 symbol ASCII font". Strictly-speaking, "ASCII is contained within 2 to the 7th power, or 128 characters" as explained on the Microsoft site (http://www.microsoft.com/typography/unicode/cs.htm).
Here is from this same Microsoft page:

"You can think of Windows ANSI as a lower 128, and an upper 128. The lower 128 is identical to ASCII, and the upper 128 is different for each ANSI character set, and is where the various international characters are parked." (http://www.microsoft.com/typography/unicode/cs.htm)

So if your document is Cyrillic, it was not written using any so-called "ASCII font" but using the appropriate 8-Bit ANSI character set and codepage. You do not need to "convert it to UNICODE font" at all.

5) Here is what I can suggest:

a) Open your "problem" document in any application (NotePad, Wordpad, or the original application where it was created).
b) Copy its whole contents to the Windows clipboard.
c) Open Atlantis and switch to the appropriate (Cyrillic?) CodePage.
d) Paste the clipboard contents into a new Atlantis document.
e) Select this whole new document, then apply any appropriate language coding ("Format | Language...")
e) Save your document as RTF or DOC as you please.

Your "new" document should display correctly in its original language with the appropriate glyphs.

Note that the choice of font(s) is not dependent on whether it is a UNICODE or NON-UNICODE font, but on whether you want to format your document with a typeface or another. Your choice of font(s) should depend on how you want your document(s) to look. Fonts at the end-user's level are a matter of design and looks, not of "coding".

Cheers
Robert

Latchezar · Post by **Latchezar** » Sat Feb 28, 2004 4:31 am

Thanks, Robert!

The matter in general is how you explained, yes.

Telling a little carelessly "256-character ASCII font" I had in mind simply the so called "old" 256-character fonts. Of course, they are not just "old", but, well, let's say, before the Unicode standard

You know the structure of an Unicode font.

WordPad in Win 98, if I remember well, showed Arial, for example, the parts separately - Arial LATIN, Arial CYR etc.

WordPad in Win XP switches to/from Latin to Cyrillic with no problems, but doesn't show the sets separately - it shows only one thing, Arial, and that's all. So does Atlantis.

It's interesting, but QuarkXPress, too, shows Arial LATIN, Arial CYR etc., nevermind people say that it doesn't support Unicode. It's not true, it does.

Lets see now.

I open Charles Perrault Tales, typeset in 256-character SP Time, and copy a paragraph. I open Atlantis. I switch to Bulgarian. I even do Format -> Language -> Bulgarian. And now! Flourishes and drums! I paste the text in the empty document, the font set to Arial (Unicode).

Here is the result:

Íå âñè÷êè ñà ãè ÷åëè â òîçè âèä, à ìàëöèíà ïîçíàâàò ïúðâèòå òðè â ñòèõîâå. Äîñåãà ñà ïðåðàçêàçâàíè ïî ìíîãî íà÷èíè è ñà èçäàâàíè ñ ðàçëè÷íè èëþñòðàöèè. Ìîæå áè ñå ñìÿòà, ÷å äåöàòà íÿìà äà ãè ðàçáåðàò äîáðå â èñòèíñêèÿ èì âèä. Âñúùíîñò äåöàòà ðàçáèðàò âñè÷êî, êîåòî è âúçðàñòíèòå, ÷å è äðóãè íåùà. Òúé èëè èíà÷å ? åòî ãè òóê.

I select the text and format it again. Format -> Language -> Bulgarian. Again the same charming "text".

Now I'll write "Robert" in Atlantis with Arial.

Latin:

Robert

I switch to Bulgarian:

Ðîáåðò

Do you like your new name?

I open WordPad.

Latin:

Robert

I swithch to Bulgarian:

??????

I WANT TO WRITE IN ATLANTIS IN BULGARIAN WITH UNICODE!

Friends, greetings from Sofia!

By the way, "Administrator" in Bulgarian with Arial in Atlantis is:

Àäìèíèñòðàòîð

I'm not sure if Cyril and Methodius, the patrons of Europe have had this in mind

Latchezar · Post by **Latchezar** » Sat Feb 28, 2004 4:43 am

Wait, wait, wait!!!

All the time while I have pasted, switched Lat -> Cyr and have written in Bulgarian, I have seen so called "monkies" or "Mongolians"!

It was like that even when I pasted the text here.

But after I posted the message, the machine read it correctly (well, almost correctly)!

Why Atlantis displays rubbish than?! (With my apology, of course

I do not understand anything already!

Robert · Post by **Robert** » Sat Feb 28, 2004 6:18 am

Latchezar wrote: WordPad in Win XP switches to/from Latin to Cyrillic with no problems, but doesn't show the sets separately - it shows only one thing, Arial, and that's all. So does Atlantis.

Please, do not be so concerned about ANSI or UNICODE subsets.
Please, be practical.
1) Install and activate the appropriate language support on your Windows system.
2) Open Atlantis, then switch to the required language coding.
3) Choose a font face, a font size, a font colour if required, then apply it to a selection or to your whole document.

Your document or selection should display with the appropriate glyphs.

That's all there is to language support in Windows and Atlantis. You won't need any extra frills.

People typing documents in different languages have (or should have) only one single real problem, i.e. that of the keyboard they are using.

There are various solutions to that problem but neither ANSI or UNICODE are part of the solution.

Latchezar wrote: I WANT TO WRITE IN ATLANTIS IN BULGARIAN WITH UNICODE!

Again, you do NOT need UNICODE to type Bulgarian characters in Atlantis or any other word processor.
All that is needed is as described right above.

Your problem might be with the keyboard if it isn't a Bulgarian keyboard. But again, UNICODE would not help with this!

Finally, do not forget that Atlantis has a "Format | Default Language..." command.
In the "Default Language Coding" dialog, you can specify which character set (language) you want Atlantis to automatically associate with your typing.

Use the options in this "Format | Default Language..." dialog and your typing will be plain sailing.

Cheers
Robert

Post by **admin** » Sat Feb 28, 2004 8:10 am

To sum up all the said above:

If you want to type texts in different languages, you should use the corresponding keyboard layouts. Note that some fonts include glyphs of the Western European characters only. Such fonts cannot be used for typing texts in Central or Eastern European languages.

If you have a properly saved RTF or DOC containing text in your language, this document should be displayed properly in any properly designed word processor (again, this has nothing to do with UNICODE).

The language characteristic of text in Atlantis (displayed in the status bar) in no way affects how this text is displayed in Atlantis. It is used by the spellchecker, the AutoCorrect feature of Atlantis (and a few other features). In the future, it will be also used by the automatic hyphenation of Atlantis.
Click the following link to read more about the Language characteristic in Atlantis:
http://www.rssol.com/en/html/tips/atlan ... guages.htm

If you have a plain text document which contains text in your language, and this language does not match the language of your system (of your Windows), this plain text document might be displayed wrongly in Atlantis. Plain text documents do not include information about the ANSI code page which should be used for displaying these documents. This is why Atlantis uses the default ANSI code page for displaying such plain text documents. Again, most Bulgarian users specify “Bulgarian” as the default language of their Windows. And they experience no problems with opening such plain text documents on their systems, and plain text documents containing English texts. The problem is when you try to open a Bulgarian plain text file on a system with the English language as a default language. In such cases, your Bulgarian plain text document would be displayed in the Western ANSI code page (and consequently, it would be unreadable).

But Atlantis allows to resolve even such cases which, if to be honest, are rather exotic (not too many Western users need to open plain text files containing texts in non-Western languages).
Atlantis allows to manually select the ANSI code page which should be used for displaying the selected text. Atlantis has the “Change code page” command (the “Format” category). It is very rarely needed by the Atlantis users. And Atlantis does not install this command either to the toolbars or menus by default. But you could install it to your menus or toolbars through the “Tools | Menus” or “Tools | Toolbars” dialogs. You could also associate this command with a hot key through the “Tools | Hot Keys” dialog. If your Bulgarian plain text document is wrongly displayed in Atlantis (again, the reason is that you are trying to open it on a system with a non-Cyrillic language as a default language), you could select this unreadable text, then choose this “Change code page” command, and select the actual language of the selected text. Atlantis would automatically apply the associated ANSI code page to the selected text. But in order to avoid further inconveniences with reopening this plain text document, you should save it as RTF, or DOC, or COD.

Atlantis also has another feature intended mainly for non-Western users who use multiple keyboard layouts. If you accidentally typed a Bulgarian sentence with the English keyboard layout activated, your text would be unreadable. Changing the code page would not help too because the English keyboard layout generates the basic ASCII characters only (with codes <127). These basic characters (glyphs) are the same in all ANSI code pages. But the “Convert text between keyboard layouts” command (the “Edit” category) allows to fix such wrongly typed texts. You could select your unreadable text, choose this “Convert text between keyboard layouts” command (note: it is not installed by Atlantis to the menus and toolbars by default too), and specify the keyboard layout which was wrongly used for typing your Bulgarian text (in our case, the “English” keyboard layout was wrongly used), and the keyboard layout which was supposed to be used (in our case, the “Bulgarian” keyboard layout). Atlantis would automatically “retype” your text in a proper keyboard layout, and associate it with the corresponding ANSI code page and language.

Latchezar · Post by **Latchezar** » Sat Feb 28, 2004 11:56 am

Friends, thank you very much for your time, attention and help!

I didn't want to make a big problem from the above matter.

In the next minutes I'll experiment with all your advices.

Till next time

Greetings.

Latchezar

ron · Post by **ron** » Sun Nov 06, 2005 9:53 pm

I don't think Atlantis behaves like other word processors in the manner discussed in this thread. Here is a simple RTF file (all the formatting left out to make the example simple):

{\rtf1
{\uc0\u256 }
}

Type this in to a text file and save it as RTF. Now open with various word processors. Most of the time you will get the correct answer - a single "Latin Capital A with macaroni" - but not with Atlantis.

I believe at heart that this is the issue that was being discussed.

As discussed previously, typing in a Latin Capital A with macaroni is possible in Atlantis, but pasting one in is not possible.

Robert · Post by **Robert** » Mon Nov 07, 2005 1:02 am

ron wrote: As discussed previously, typing in a Latin Capital A with macaroni is possible in Atlantis, but pasting one in is not possible.

Greetings–
I am sorry but you are wrong here.
It is perfectly possible to paste a “Latin Capital A with macaroni” into Atlantis.
There are actually 2 ways to do it:
1. You can paste it using the Atlantis “Insert | Symbol…” dialog, and, for example, the Lithuanian language from the drop-down menu on the top right.
2. You can paste it after copying it from WordPad, for example. Or from any other application that can display it.
Cheers
Robert

Post by **admin** » Mon Nov 07, 2005 1:09 am

I don't think Atlantis behaves like other word processors in the manner discussed in this thread. Here is a simple RTF file (all the formatting left out to make the example simple):

{\rtf1
{\uc0\u256 }
}

This is unrelated to the question discussed in this topic.
Any text that can be represented in ANSI (except for texts in Far-East languages) can be pasted to Atlantis without a problem. None major word processor generates the RTF code specified in your post. Your code is unorthodox at the best. "\uc" should never be followed by "0". Its default and normal value is 1 (in some cases 2). And any UNICODE character ("\uNNNN") must be preceded by its equivalent in ANSI coding.

luha · Post by **luha** » Mon Nov 07, 2005 2:13 am

2) On the practical matter of language support in Atlantis or in any other word processor, things are currently very simple.
Unless you want to use or create documents in the Chinese, Japanese or Korean languages, you do not need UNICODE at all. I repeat, you do NOT need UNICODE at all.

I want Atlantis to support Unicode, too! I sometimes want to use special characters, like the long s. In Word, I can use a Unicode font and get a long s by typing a special code. In Atlantis, I have to use a special font that contains the long s.

mvh
Sigurd Hasle

Guest · Post by **Guest** » Mon Nov 07, 2005 8:32 am

Robert wrote: I am sorry but you are wrong here.
It is perfectly possible to paste a “Latin Capital A with macaroni” into Atlantis.
There are actually 2 ways to do it:
1. You can paste it using the Atlantis “Insert | Symbol…” dialog, and, for example, the Lithuanian language from the drop-down menu on the top right.
2. You can paste it after copying it from WordPad, for example. Or from any other application that can display it.

Hi,
You are correct, I am partiallly wrong. We disagree on #1, which I consider typing. But I didn't know about #2. You can do it from Wordpad, but not from other word processors that display it.

admin wrote: This is unrelated to the question discussed in this topic.
Any text that can be represented in ANSI (except for texts in Far-East languages) can be pasted to Atlantis without a problem. None major word processor generates the RTF code specified in your post. Your code is unorthodox at the best. "\uc" should never be followed by "0". Its default and normal value is 1 (in some cases 2). And any UNICODE character ("\uNNNN") must be preceded by its equivalent in ANSI coding.

My example depends not at all on \uc0! Try \uc1. Here are the guts of the rtf file for a single \u256 character from several programs:

Polyedit............\viewkind4\uc1\pard\f0\fs20\u256
OpenOffice........\loch\f2\fs24\lang1033\i0\b0\f2 \u256
Abiword............\s29\f0\fs24\lang1033{\*\listtag0}\abinodiroverride\ltrch \uc0\u256
Copywriter........\viewkind4\uc1\pard\ltrpar\lang1033\f0\fs24\u256
Wordpad...........\viewkind4\uc1\pard\lang1033\f0\fs20\'c2\f1
Atlantis..............\plain\f21\fs24\pard\nowidctlpar\widctlpar\f43\fs22\lang1062 Â

It may be the case that Polyedit, OpenOffice, Abiword and Copywriter just encode things not in accord with the RTF standard - I don't know. But it is very common, and Atlantis cannot read it. Of the six programs listed, the first 5 can read all six files. Atlantis can read the Wordpad and Atlantis files. Metapad (96 Kb notepad replacement) can read all 6.

And this is exactly the topic being discussed by the non-Atlantis postings in this thread.

Ron · Post by **Ron** » Mon Nov 07, 2005 8:35 am

Sorry,

Guest above is Ron.

Post by **admin** » Mon Nov 07, 2005 9:19 am

The native coding of the Rich Text Format is ANSI. UNICODE characters should be used in RTF documents only when they cannot be represented in ANSI. At least each UNICODE character should be accompanied by its ANSI representation. Here is an excerpt from the “Microsoft Office Word 2003 Rich Text Format (RTF) Specification”:

\uN This keyword represents a single Unicode character that has no equivalent ANSI representation based on the current ANSI code page. N represents the Unicode character value expressed as a decimal number. This keyword is followed immediately by equivalent character(s) in ANSI representation. In this way, old readers will ignore the \uN keyword and pick up the ANSI representation properly.

"\u256" must be followed by the ANSI representation of this character (I said one wrong thing in my previous post: a UNICODE character must be followed by its ANSI equivalent not preceded). The UNICODE character with code “256” can be represented in ANSI. It is irrelevant how AbiWord generates RTFs. As things stand, the RTF specification belongs to Microsoft, and the standard is how MS Word works.

Atlantis Word Processor Forum