Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[lldb][windows] force the console to use a UTF-8 codepage#149493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
charles-zablit wants to merge2 commits intollvm:main
base:main
Choose a base branch
Loading
fromcharles-zablit:charles-zablit/lldb/fix-unicode-support-windows

Conversation

charles-zablit
Copy link
Contributor

This patch sets the codepage of the parent Windows console toutf-8 and resets it back to the original codepage oncelldb exits.

This fixes a rendering issue where the characters defined inDiagnosticsRendering.cpp ("╰" for instance) are not rendered properly on Windows out of the box, because the default codepage is notutf-8.

This solution is based on thisSO thread andthis patch downstream.

rdar://156064500

@llvmbot
Copy link
Member

@llvm/pr-subscribers-lldb

Author: Charles Zablit (charles-zablit)

Changes

This patch sets the codepage of the parent Windows console toutf-8 and resets it back to the original codepage oncelldb exits.

This fixes a rendering issue where the characters defined inDiagnosticsRendering.cpp ("╰" for instance) are not rendered properly on Windows out of the box, because the default codepage is notutf-8.

This solution is based on thisSO thread andthis patch downstream.

rdar://156064500


Full diff:https://github.com/llvm/llvm-project/pull/149493.diff

2 Files Affected:

  • (modified) lldb/source/Plugins/Platform/Windows/PlatformWindows.cpp (+20)
  • (modified) lldb/source/Plugins/Platform/Windows/PlatformWindows.h (+8)
diff --git a/lldb/source/Plugins/Platform/Windows/PlatformWindows.cpp b/lldb/source/Plugins/Platform/Windows/PlatformWindows.cppindex c0c26cc5f1954..d3e981de81313 100644--- a/lldb/source/Plugins/Platform/Windows/PlatformWindows.cpp+++ b/lldb/source/Plugins/Platform/Windows/PlatformWindows.cpp@@ -41,6 +41,10 @@ LLDB_PLUGIN_DEFINE(PlatformWindows)  static uint32_t g_initialize_count = 0;+#if defined(_WIN32)+std::optional<UINT> g_prev_console_cp = std::nullopt;+#endif+ PlatformSP PlatformWindows::CreateInstance(bool force,                                            const lldb_private::ArchSpec *arch) {   // The only time we create an instance is when we are creating a remote@@ -98,6 +102,7 @@ void PlatformWindows::Initialize() {     default_platform_sp->SetSystemArchitecture(HostInfo::GetArchitecture());     Platform::SetHostPlatform(default_platform_sp); #endif+    SetConsoleCodePage();     PluginManager::RegisterPlugin(         PlatformWindows::GetPluginNameStatic(false),         PlatformWindows::GetPluginDescriptionStatic(false),@@ -108,6 +113,7 @@ void PlatformWindows::Initialize() { void PlatformWindows::Terminate() {   if (g_initialize_count > 0) {     if (--g_initialize_count == 0) {+      ResetConsoleCodePage();       PluginManager::UnregisterPlugin(PlatformWindows::CreateInstance);     }   }@@ -808,3 +814,17 @@ extern "C" {    return Status(); }++void PlatformWindows::SetConsoleCodePage() {+  #if defined(_WIN32)+    g_prev_console_cp = GetConsoleOutputCP();+    SetConsoleOutputCP(CP_UTF8);+  #endif+}++void PlatformWindows::ResetConsoleCodePage() {+  #if defined(_WIN32)+  if (g_prev_console_cp)+    SetConsoleOutputCP(*g_prev_console_cp);+  #endif+}diff --git a/lldb/source/Plugins/Platform/Windows/PlatformWindows.h b/lldb/source/Plugins/Platform/Windows/PlatformWindows.hindex 771133f341e90..d14aa52e5e1c8 100644--- a/lldb/source/Plugins/Platform/Windows/PlatformWindows.h+++ b/lldb/source/Plugins/Platform/Windows/PlatformWindows.h@@ -80,6 +80,14 @@ class PlatformWindows : public RemoteAwarePlatform {   size_t GetSoftwareBreakpointTrapOpcode(Target &target,                                          BreakpointSite *bp_site) override;+  /// Set the current console's code page to UTF-8 and store the previous+  /// codepage in \a g_prev_console_cp.+  static void SetConsoleCodePage();++  /// Reset the current console's code page to the value stored+  /// in \a g_prev_console_cp if any.+  static void ResetConsoleCodePage();+   std::vector<ArchSpec> m_supported_architectures;  private:

@charles-zablit
Copy link
ContributorAuthor

Before

Screenshot 2025-07-18 at 12 24 31

After

Screenshot 2025-07-18 at 12 24 13

@github-actionsGitHub Actions
Copy link

github-actionsbot commentedJul 18, 2025
edited
Loading

✅ With the latest revision this PR passed the C/C++ code formatter.

@DavidSpickett
Copy link
Collaborator

I opened an issue for this#142568.

Where@Nerixyz mentions that SetConsoleOutputCP might have problems in cmd.exe (which probably means conhost, the original windows terminal host, as opposed to "windows terminal", the new one).

charles-zablit reacted with thumbs up emoji

@DavidSpickettDavidSpickett changed the title[windows][lldb] force the console to use a UTF-8 codepage[lldb][windows] force the console to use a UTF-8 codepageJul 18, 2025
@DavidSpickett
Copy link
Collaborator

Windows Terminal is the default on Windows 10 at least. I think buildbot launches things in conhost, but if utf-8 there was a problem for tests, we would have seen it before now.

@charles-zablit
Copy link
ContributorAuthor

charles-zablit commentedJul 18, 2025
edited
Loading

I opened an issue for this#142568.

Where@Nerixyz mentions that SetConsoleOutputCP might have problems in cmd.exe (which probably means conhost, the original windows terminal host, as opposed to "windows terminal", the new one).

From my understanding, theoriginal issue in jq is that they did not reset the codepage after the program had exited. My patch does reset it if lldb gracefully exits.

The/utf-8 approach seems promising as well but I find it suspicious that no other project uses it.

Another temporary fix for this while we figure out a long term solution could be to force the use of theANSI characters on windows.

@charles-zablit
Copy link
ContributorAuthor

charles-zablit commentedJul 18, 2025
edited
Loading

Windows Terminal is the default on Windows 10 at least. I think buildbot launches things in conhost, but if utf-8 there was a problem for tests, we would have seen it before now.

Maybe I misunderstood your comment, but I was able to reproduce this with the latest release of lldb in the Windows 11 terminal.

Screenshot 2025-07-18 at 13 06 14

@DavidSpickett
Copy link
Collaborator

Yes that makes sense, Windows Terminal doesn't default to utf-8 either. I was thinking of something else.

What if we added this new code, and it tried to set utf-8 and a test relied on that. However, this is not a problem because we already test this annotation feature via conhost and it has no problems. So even if the calls do nothing, it doesn't matter.

In other words: tests on Windows aren't scraping the output of the terminal, they'll be reading strings internally and not care about the code page.

Which is a good thing.

charles-zablit reacted with thumbs up emoji

@charles-zablit
Copy link
ContributorAuthor

charles-zablit commentedJul 18, 2025
edited
Loading

From my understanding of the thread you linked, there are 3 ways to approach this:

  1. Switch to ASCII characters on Windows instead of the"╰" character. This is by far the easiest way to fix this specific rendering issue, but does not address the root issue. Debugging a program with non ASCII characters will break.
  2. Set the code page when lldb starts. Reset the codepage when it exits. This used to be a no-go because it would cause some resizing inCMD.exe, but that was over 6 years ago. Terminal isthe default console in Windows 11 as of 2022.
  3. Use/execution-charset:utf-8 as@Nerixyz suggested. I will start a build of lldb with this change. If this does not have the same problems as manually setting the code page, this sounds like the most appealing solution.
  4. (bonus) Gothe Python way and build a wrapper usingWriteConsoleW, which would properly address the issue. This would however require a lot of engineering (this resolvedLess than ideal handling of variable names in Cyrillic alphabet #35615).
Nerixyz reacted with thumbs up emoji

@charles-zablit
Copy link
ContributorAuthor

I addedadd_compile_options(/execution-charset:utf-8) tollvm-project\lldb\CMakeLists.txt however that did not fix the issue.

@Nerixyz
Copy link
Contributor

I addedadd_compile_options(/execution-charset:utf-8) tollvm-project\lldb\CMakeLists.txt however that did not fix the issue.

Then setting the code page is probably the best idea, requiring the least amount of effort.

As far as I know,WriteConsoleW is the proper way to get Unicode on the console (this resolved#35615). However, that would require a new output, because currently, everything goes through the stdout FD and the C API.

charles-zablit reacted with thumbs up emoji

@charles-zablit
Copy link
ContributorAuthor

As far as I know,WriteConsoleW is the proper way to get Unicode on the console (this resolved#35615). However, that would require a new output, because currently, everything goes through the stdout FD and the C API.

Thanks for clarifying, I corrected the 4th option.

2b8c692 does not look like such big of a change, islldb's mechanism for printing so different fromclang's? I can't find where aStream actually gets "printed" to the stdout.

@Nerixyz
Copy link
Contributor

I can't find where aStream actually gets "printed" to the stdout.

ANativeFile stream is used, which is createdhere.

2b8c692 does not look like such big of a change, islldb's mechanism for printing so different fromclang's?

I agree, that would probably be of a similar size here (i.e. check for the file handle and call the windows impl if needed).

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@JDevlieghereJDevlieghereAwaiting requested review from JDevlieghereJDevlieghere is a code owner

@compnerdcompnerdAwaiting requested review from compnerd

@Michael137Michael137Awaiting requested review from Michael137

Assignees

@charles-zablitcharles-zablit

Labels
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@charles-zablit@llvmbot@DavidSpickett@Nerixyz

[8]ページ先頭

©2009-2025 Movatter.jp