C++ dll sends utf8 as const char*, c# needs to do... what?

Question 1

The dll is written in c++ and sends text as utf8 through aconst char* callback.

First, is this the correct way to declare the callback?

[UnmanagedFunctionPointer( CallingConvention.StdCall )]public delegate void TextCallback( string sText );[DllImport(    "cppLib.dll",    CharSet = CharSet.Ansi,    CallingConvention = CallingConvention.Cdecl )]public static extern void GetText(    [MarshalAs( UnmanagedType.FunctionPtr )] TextCallback textCallback );private TextCallback m_TextCallback;

Native callback:

typedef void ( __stdcall * TextCallback )( const char* szText );

If so, how do I handle the utf8 text once it arrives?

I'm sending it to a RichTextBox and the utf8 chars come out as garbage (the ones that happen to be ascii print fine).

Thank you all.

ANSWER

In the comments below, TheUndeadFish provided alink with an answer that works. It is also explained to a degree. Those interested should take a look. I'm just replicating the answer here as it applies to the code I posted.

Only this modification is needed:

[UnmanagedFunctionPointer( CallingConvention.StdCall )]public delegate void TextCallback( IntPtr ipText );

The delegate reinterprets the int pointer (pointing to a utf8 string from the c++ dll) as follows:

m_TextCallback = ( ipText ) =>{     var data = new System.Collections.Generic.List<byte>();     var off = 0;     while( true )     {         var ch = Marshal.ReadByte( ipText, off++ );        if( ch == 0 )        {            break;        }        data.Add( ch );    }    string sptr = Encoding.UTF8.GetString( data.ToArray() );};

Question 2

What is the signature of the native callback?

Question 3

The first google result for "dllimport charset utf-8" isblog.kutulu.org/2012/04/… which looks quite relevant. I quickly skimmed it and its conclusion appears to be that the CharSet attribute doesn't handle conversion to/from UTF-8.

Question 4

@TheUndeadFish. My heartfelt thanks were removed for some reason.

Question 5

You should use CharSet.Unicode (if the string is a wchar*, 2 bytes per char) or CharSet.Ansi (if the string is 1 byte per char).

Since your string is in UTF8, you should convert by hand. None of the default conversion fit your problem.

Question 6

Thanks, I have missed declaring the delegate as a global. Now, I have set the charset to unicode. However, the same garbage is still printed for non-ascii characters.

Question 7

Probably you don't have an unicode string inside this pointer. You can have several diferent types of string in C++. The type you specify on the delegate is the type you expect C++ to send you, not the type you want to use internally in .NET. You need to specify the right type to .NET handle it correctly. If it is a WideString, it you parse the chars in couples (2 bytes per char). If it is unicode, it will parse 1 byte per single char and 1 to 4 bytes per special char... that's the idea.

Question 8

Yep, that's the point. The string is in utf8. And, yes, c# makes it arrive already "doubled up by brute force", thus converted but wrongly converted. I found a workaround and updated the original question.

Question 9

In other words... you've converted by hand. I'm curious about why it didn't work. I have a theory. "Unicode" for C++ means two bytes by char, not real UTF8 (1 byte per simple char, 1 to 4 bytes per special char). Probably C++ is sending in UTF8 and specifying "Unicode" in the callback expects two bytes by char. Makes sense?

Question 10

Maybe in C++ a function like WideCharToMultiByte is being called prior to sending the char* to C#:msdn.microsoft.com/en-us/library/windows/desktop/…

Eric Lemes 5813 silver badges10 bronze badges · Accepted Answer · 2014-08-05 22:00:57Z

2

You should use CharSet.Unicode (if the string is a wchar*, 2 bytes per char) or CharSet.Ansi (if the string is 1 byte per char).

Since your string is in UTF8, you should convert by hand. None of the default conversion fit your problem.

Share

Improve this answer

editedAug 5, 2014 at 22:00

answeredAug 4, 2014 at 21:49

Eric Lemes

5813 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

user1908746

user1908746Over a year ago

Thanks, I have missed declaring the delegate as a global. Now, I have set the charset to unicode. However, the same garbage is still printed for non-ascii characters.

2014-08-04T22:29:30.783Z+00:00

Eric Lemes

Eric Lemes Over a year ago

Probably you don't have an unicode string inside this pointer. You can have several diferent types of string in C++. The type you specify on the delegate is the type you expect C++ to send you, not the type you want to use internally in .NET. You need to specify the right type to .NET handle it correctly. If it is a WideString, it you parse the chars in couples (2 bytes per char). If it is unicode, it will parse 1 byte per single char and 1 to 4 bytes per special char... that's the idea.

2014-08-05T11:26:11.39Z+00:00

user1908746

user1908746Over a year ago

Yep, that's the point. The string is in utf8. And, yes, c# makes it arrive already "doubled up by brute force", thus converted but wrongly converted. I found a workaround and updated the original question.

2014-08-05T18:51:15.577Z+00:00

Eric Lemes

Eric Lemes Over a year ago

In other words... you've converted by hand. I'm curious about why it didn't work. I have a theory. "Unicode" for C++ means two bytes by char, not real UTF8 (1 byte per simple char, 1 to 4 bytes per special char). Probably C++ is sending in UTF8 and specifying "Unicode" in the callback expects two bytes by char. Makes sense?

2014-08-05T20:40:47.063Z+00:00

Eric Lemes

Eric Lemes Over a year ago

Maybe in C++ a function like WideCharToMultiByte is being called prior to sending the char* to C#:msdn.microsoft.com/en-us/library/windows/desktop/…

2014-08-05T20:43:03.93Z+00:00

|

Movatterモバイル変換

Collectives™ on Stack Overflow

C++ dll sends utf8 as const char*, c# needs to do... what?

1 Answer1

9 Comments

Your Answer

Sign up orlog in

Post as a guest

Related

Hot Network Questions

Subscribe to RSS