Movatterモバイル変換


[0]ホーム

URL:


Wayback Machine
35 captures
14 Dec 2007 - 12 Jan 2025
NovDECJul
Previous capture14Next capture
200920112013
success
fail
COLLECTED BY
Organization:Alexa Crawls
Starting in 1996,Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to theWayback Machine after an embargo period.
Collection:Alexa Crawls
Starting in 1996,Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to theWayback Machine after an embargo period.
TIMESTAMPS
loading
The Wayback Machine - https://web.archive.org/web/20111214095218/http://www.codeproject.com:80/KB/cpp/unicode.aspx
Click here to Skip to main content
8,332,761 members and growing!
EmailPassword Lost password?
Home
Search within:




Licence CPOL
First Posted 28 Mar 2000
Views 399,434
Bookmarked 137 times

Unicode, MBCS and Generic text mappings

ByChris Maunder | 28 Mar 2000
A guide to using generic text functions to make the transition between character sets simple and painless
 
See Also
Print Article
add
Add to your CodeProject bookmarks
Discuss
Discuss this article
76
  4.92 (60 votes)

1

2
3 votes, 9.4%
3
3 votes, 9.4%
4
26 votes, 81.3%
5
4.92/5 - 60 votes
3 removed
μ 4.64, σa 1.11 [?]
Sponsored Links

Introduction

In order to allow your programs to be used in internationalmarkets it is worth making your application Unicode or MBCSaware. The Unicode character set is a "wide character"(2 bytes per character) set that contains every characteravailable in every language, including all technical symbols andspecial publishing characters. Multibyte character set (MBCS)uses either 1 or 2 bytes per character and is used for charactersets that contain large numbers of different characters (eg Asianlanguage character sets).

Which character set you use depends on the language and theoperating system. Unicode requires more space than MBCS sinceeach character is 2 bytes. It is also faster than MBCS and isused by Windows NT as standard, so non-Unicode strings passed toand from the operating system must be translated, incurringoverhead. However, Unicode is not supported on Win95 and so MBCSmay be a better choice in this situation. Note that if you wishto develop applications in the Windows CE environment then allapplications must be compiled in Unicode.

Using MBCS or Unicode

The best way to use Unicode or MBCS - or indeed even ASCII -in your programs is to use the generic text mapping macrosprovided by Visual C++. That way you can simply use a singledefine to swap between Unicode, MBCS and ASCII without having todo any recoding.

To use MBCS or Unicode you need only define either_MBCSor_UNICODE in your project. For Unicode youwill also need to specify the entry point symbol in your Projectsettings aswWinMainCRTStartup. Please note thatif both_MBCS and_UNICODE aredefined then the result will be unpredictable.

Adjusting your project settings

Generic Text mappings and portable functions

The generic text mappings replace the standard char or LPSTRtypes with generic TCHAR or LPTSTR macros. These macros will mapto different types and functions depending on whether you havecompiled with Unicode or MBCS (or neither) defined. The simplestway to use the TCHAR type is to use theCStringclass - it is extremely flexible and does most of the work foryou.

In conjunction with the generic character type, there is a setof generic string manipulation functions prefixed by_tcs.For instance, instead of using thestrrevfunction in your code, you should use the_tcsrevfunction which will map to the correct function depending onwhich character set you have compiled for. The table belowdemonstrates:

#defineCompiled VersionExample
_UNICODEUnicode (wide-character)_tcsrev maps to_wcsrev
_MBCSMultibyte-character_tcsrev maps to_mbsrev
None (the default: neither_UNICODE nor_MBCS defined)SBCS (ASCII)_tcsrev maps tostrrev

Eachstr* function has a correspondingtcs*function that should be used instead. See the TCHAR.H file forall the mapping and macros that are available. Just look up theonline help for the string function in question in order to findthe equivalent portable function.

Note:Do not use thestr*family of functions with Unicode strings, since Unicode stringsare likely to contain embedded null bytes.

The next important point is that each literal string should beenclosed by theTEXT() (or_T())macro. This macro prepends a "L" in front of literalstrings if the project is being compiled in Unicode, or doesnothing if MBCS or ASCII is being used. For instance, the string_T("Hello") will be interpreted as"Hello" inMBCS or ASCII, andL"Hello" in Unicode. If you areworking in Unicode and do not use the_T()macro, you may get compiler warnings.

Note that you can use ASCII and Unicode within the sameprogram, but not within the same string.

All MFC functions except for database class member functionsare Unicode aware. This is because many database drivers themselvesdo not handle Unicode, and so there was no point in writing Unicodeaware MFC classes to wrap these drivers.

Converting between Generic types and ASCII

ATL provides a bunch of very useful macros forconverting between different character format. The basic form ofthese macros isX2Y(), where X is the sourceformat. Possible conversion formats are shown in the followingtable.

String TypeAbbreviation
ASCII (LPSTR)A
WIDE (LPWSTR)W
OLE (LPOLESTR)OLE
Generic (LPTSTR)T
ConstC

Thus,A2W converts anLPSTR to anLPWSTR,OLE2T converts anLPOLESTR to anLPTSTR, andso on.

There are alsoconst forms (denoted by aC)that convert to aconst string. For instance,A2CTconverts fromLPSTR toLPCTSTR.

When using the string conversion macros you need to includetheUSES_CONVERSIONmacro at the beginning ofyour function:

void foo(LPSTR lpsz){   USES_CONVERSION;      ...   LPTSTR szGeneric = A2T(lpsz)// Do something with szGeneric   ...}

Two caveats on using the conversion macros:

  1. Never use the conversion macros inside a tight loop. This will cause a lot of memory to be allocated each time the conversion is performed, and will result in slow code. Better to perform the conversion outside the loop and pass the converted value into the loop.

  2. Never return the result of the macros directly from a function, unless the return value implies making a copy of the data before returning. For instance, if you have a function that returns an LPOLESTR, then do not do the following:
    LPTSTR BadReturn(LPSTR lpsz){    USES_CONVERSION;// do somethingreturn A2T(lpsz);}

    Instead, you should return the value as a CString, which would imply a copy of the string would be made before the function returns:

    CString GoodReturn(LPSTR lpsz){    USES_CONVERSION;// do somethingreturn A2T(lpsz);}

Tips and Traps

The TRACE statement

TheTRACE macros have a few cousins - namelytheTRACE0,TRACE1,TRACE2andTRACE3macros. These macros allow you tospecify a format string (as in the normalTRACEmacro), and either 0,1,2 or 3 parameters, without the need toenclose your literal format string in the_T()macro. For instance,

TRACE(_T("This is trace statement number %d\n"),1);

can be written

TRACE1("This is trace statement number %d\n",1);

Viewing Unicode strings in the debugger

If you are using Unicode in your applciation and wish to view Unicode stringsin the debugger, then you will need to go to Tools | Options | Debug and clickon "Display Unicode Strings".

The Length of strings

Be careful when performing operations that depend on the sizeor length of a string. For instance,CString::GetLengthreturns the number of characters in a string, NOT the size inbytes. If you were to write the string to aCArchiveobject, then you would need to multiply the length of the stringby the size of each character in the string to get the number ofbytes to write:

CString str = _T("Hello, World");archive.Write( str, str.GetLength( ) *sizeof( TCHAR ) );

Reading and Writing ASCII text files

If you are using Unicode or MBCS then you need to be carefulwhen writing ASCII files. The safest and easiest way to writetext files is to use theCStdioFile classprovided with MFC. Just use theCStringclassand theReadString andWriteString memberfunctions and nothing should go wrong. However, if you need touse theCFile class and it's associatedReadandWrite functions, then if you use the following code:

CFile file(...); CString str = _T("This is some text"); file.Write( str, (str.GetLength()+1) *sizeof( TCHAR ) );

instead of

CStdioFile file(...); CString str = _T("This is some text"); file.WriteString(str);

then the results will be Significantly different. The two lines oftext below are from a file created using the first and second code snippetsrespectively:

Unicode strings

(This text was viewed using WordPad)

Not all structures use the generic text mappings

For instance, theCHARFORMAT structure, if the RichEditControlversion is less than 2.0, uses achar[] for theszFaceName field,instead of aTCHAR as would be expected. You must be careful notto blindly change "..." to_T("...") withoutfirst checking. In this case, you would probably need to convertfromTCHAR to char before copying any data to theszFaceNamefield.

Copying text to the Clipboard

This is one area where you may need to use ASCII and Unicodein the same program, since theCF_TEXT format for the clipboarduses ASCII only. NT systems have the option of theCF_UNICODETEXTif you wish to use Unicode on the clipboard.

Installing the Unicode MFC libraries

The Unicode versions of the MFC libraries arenot copied to your hard drive unless you select them during aCustom installation. They are not copied during other types ofinstallation. If you attempt to build or run an MFC Unicodeapplication without the MFC Unicode files, you may get errors.

(From the online docs) To copy the files toyour hard drive, rerun Setup, chooseCustom installation,clear all other components except "Microsoft FoundationClass Libraries," click theDetailsbutton, andselect both "Static Library for Unicode" and"Shared Library for Unicode."

License

This article, along with any associated source code and files, is licensed underThe Code Project Open License (CPOL)

About the Author

Chris Maunder

Founder
The Code Project
Canada Canada

Member

Follow on Twitter Follow on Twitter
Chris is the Co-founder, Administrator, Architect, Chief Editor and Shameless Hack who wrote and runs The Code Project. He's been programming since 1988 while pretending to be, in various guises, an astrophysicist, mathematician, physicist, hydrologist, geomorphologist, defence intelligence researcher and then, when all that got a bit rough on the nerves, a web developer. He is a Microsoft Visual C++ MVP both globally and for Canada locally.
 
His programming experience includes C/C++, C#, SQL, MFC, ASP, ASP.NET, and far, far too much FORTRAN. He has worked on PocketPCs, AIX mainframes, Sun workstations, and a CRAY YMP C90 behemoth but finds notebooks take up less desk space.
 
He dodges, he weaves, and he never gets enough sleep. He is kind to small animals.
 
Chris was born and bred in Australia but splits his time between Toronto and Melbourne, depending on the weather. For relaxation he is into road cycling, snowboarding, rock climbing, and storm chasing.

loading...
Sign Up to vote  PoorExcellent
Add a reason or comment to your vote:x
Votes of 3 or less require a comment

Comments and Discussions

 
 RefreshFirstPrevNext
GeneralMy vote of 5mvpthatraja7:15 20 Apr '11  
GeneralRe: My vote of 5membercccfff7774:36 26 Apr '11  
GeneralThis article reproduced...memberSteve_Harris2:02 27 Jan '10  
QuestionSome of the Russian characters not displayed.memberMember 297511118:41 19 Jan '10  
GeneralRussian chars in multibyte configurationmemberWR127015:38 18 Sep '09  
Questionfacing problem in text to its Unicode value conversionmemberMember 44170502:56 8 Jul '09  
GeneralThis article referenced on a (stupid) Patentmemberddrogahn11:55 23 Feb '09  
GeneralRe: This article referenced on a (stupid) PatentadminChris Maunder11:59 23 Feb '09  
QuestionHow do I test code on a US-English computermemberAllan Braun11:35 24 Jul '08  
GeneralClarificationmemberRaghavendra Pise20:02 17 Dec '07  
Last Visit: 19:00 31 Dec '99     Last Update: 23:52 13 Dec '1112345678Next »

General General   News News   Suggestion Suggestion   Question Question   Bug Bug   Answer Answer   Joke Joke   Rant Rant   Admin Admin   

Permalink |Advertise |Privacy |Mobile
Web02 |2.5.111208.1 |Last Updated 29 Mar 2000
Article Copyright 2000 by Chris Maunder
Everything elseCopyright ©CodeProject, 1999-2011
Terms of Use
Layout:fixed|fluid

See Also...
The Daily Insider

[8]ページ先頭

©2009-2025 Movatter.jp