10

The dll is written in c++ and sends text as utf8 through aconst char* callback.

First, is this the correct way to declare the callback?

[UnmanagedFunctionPointer( CallingConvention.StdCall )]public delegate void TextCallback( string sText );[DllImport(    "cppLib.dll",    CharSet = CharSet.Ansi,    CallingConvention = CallingConvention.Cdecl )]public static extern void GetText(    [MarshalAs( UnmanagedType.FunctionPtr )] TextCallback textCallback );private TextCallback m_TextCallback;

Native callback:

typedef void ( __stdcall * TextCallback )( const char* szText );

If so, how do I handle the utf8 text once it arrives?

I'm sending it to a RichTextBox and the utf8 chars come out as garbage (the ones that happen to be ascii print fine).

Thank you all.

ANSWER

In the comments below, TheUndeadFish provided alink with an answer that works. It is also explained to a degree. Those interested should take a look. I'm just replicating the answer here as it applies to the code I posted.

Only this modification is needed:

[UnmanagedFunctionPointer( CallingConvention.StdCall )]public delegate void TextCallback( IntPtr ipText );

The delegate reinterprets the int pointer (pointing to a utf8 string from the c++ dll) as follows:

m_TextCallback = ( ipText ) =>{     var data = new System.Collections.Generic.List<byte>();     var off = 0;     while( true )     {         var ch = Marshal.ReadByte( ipText, off++ );        if( ch == 0 )        {            break;        }        data.Add( ch );    }    string sptr = Encoding.UTF8.GetString( data.ToArray() );};
askedAug 4, 2014 at 20:59
3
  • What is the signature of the native callback?CommentedAug 4, 2014 at 21:03
  • 1
    The first google result for "dllimport charset utf-8" isblog.kutulu.org/2012/04/… which looks quite relevant. I quickly skimmed it and its conclusion appears to be that the CharSet attribute doesn't handle conversion to/from UTF-8.CommentedAug 4, 2014 at 23:13
  • @TheUndeadFish. My heartfelt thanks were removed for some reason.CommentedAug 5, 2014 at 1:39

1 Answer1

2

You should use CharSet.Unicode (if the string is a wchar*, 2 bytes per char) or CharSet.Ansi (if the string is 1 byte per char).

Since your string is in UTF8, you should convert by hand. None of the default conversion fit your problem.

answeredAug 4, 2014 at 21:49
Eric Lemes's user avatar
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks, I have missed declaring the delegate as a global. Now, I have set the charset to unicode. However, the same garbage is still printed for non-ascii characters.
Probably you don't have an unicode string inside this pointer. You can have several diferent types of string in C++. The type you specify on the delegate is the type you expect C++ to send you, not the type you want to use internally in .NET. You need to specify the right type to .NET handle it correctly. If it is a WideString, it you parse the chars in couples (2 bytes per char). If it is unicode, it will parse 1 byte per single char and 1 to 4 bytes per special char... that's the idea.
Yep, that's the point. The string is in utf8. And, yes, c# makes it arrive already "doubled up by brute force", thus converted but wrongly converted. I found a workaround and updated the original question.
In other words... you've converted by hand. I'm curious about why it didn't work. I have a theory. "Unicode" for C++ means two bytes by char, not real UTF8 (1 byte per simple char, 1 to 4 bytes per special char). Probably C++ is sending in UTF8 and specifying "Unicode" in the callback expects two bytes by char. Makes sense?
Maybe in C++ a function like WideCharToMultiByte is being called prior to sending the char* to C#:msdn.microsoft.com/en-us/library/windows/desktop/…
|

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.