Ada for C/C++ Developers: Wide Characters

Originally, the Ada language only supported the ISO-8859-1 encoding, which is capable of handling most of the characters in Western European languages. I suppose that was acceptable in the 80s, and even in the late 90s perhaps as most of the users of Ada seem to come from the USA, UK, Germany and France.  But in the 21st century?  No way, completely unacceptable!

Ada 2005 added support for wide characters so that your application can support languages such as Chinese, Russian and Arabic.  It also also added support for UTF-8 file encoding. As a C++ developer who has struggled with internationalization issues, particularly reading UTF-8 text files, I truly welcome this support in Ada. I may not have the latest information, but I don’t believe C++ supports UTF-8 without the help of third party libraries.  Perhaps that will change in C++0x?

Working with wide characters in Ada is no more difficult than it is in C++.  Separate I/O libraries are provided for regular and wide text, just like in C++ (wcout/wcin, wsprintf, etc).  For example, Ada.Wide_Text_IO is the wide character equivalent of Ada.Text_IO.  I should also mention that unlike C++, Ada 2005 allows a developer to define variables  and types using languages like Chinese or Hebrew.  Though I doubt (perhaps naively) that this is utilized very often, it is interesting nonetheless.  Please note that:

  • Ada.Strings.Unbounded and Ada.Strings.Bounded are similar to the std::string and char [] arrays in C++
  • Ada.Strings.Wide_Unbounded and Ada.Strings.Wide_Bounded and are similar to the std::wstring and wchar_t [] arrays in C++.

Here is an extremely simple program to demonstrate the use of wide characters with Russian and Chinese.  If you are copying and pasting this to an editor, make sure to save the file in UTF-8 format or you might loose the Chinese and Russian text.

with Ada.Wide_Text_IO;
use Ada.Wide_Text_IO;

procedure WideText is
	Msg1: Wide_String := "Россия";
	Msg2: Wide_String := "中国";
begin
	Put_Line(Msg1);
	Put_Line(Msg2);
end WideText;

The Msg1 variable holds the Cyrillic for “Russia”, and the Msg2 variable holds the Chinese for “China”.  My test environment is Cygwin on Windows, and unfortunately, I have not figured out how to get unicode to display properly in the terminal.  Perhaps someone on a Linux box with a UTF-8 terminal could very this?  If you’re in the same boat as me, then just dump the output to a file and open it up in a unicode-friendly editor.

./widetext.exe > output.txt

You should be able to see both strings on a separate line in the file:

Россия
中国

If someone ever figures out unicode in the cygwin terminal or Windows cmd.exe terminal, please let me know!

Advertisements

3 thoughts on “Ada for C/C++ Developers: Wide Characters

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s