Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[clangd] Improve Markup Rendering#140498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
tcottin wants to merge6 commits intollvm:main
base:main
Choose a base branch
Loading
fromtcottin:improve-markup-rendering

Conversation

tcottin
Copy link

This is a preparation for fixingclangd/clangd#529.

It changes the Markup rendering to markdown and plaintext.

  • Properly separate paragraphs using an empty line between
  • Dont escape markdown syntax for markdown output except for HTML
  • Dont do any formatting for markdown because the client is handling the actual markdown rendering

flocc, lucobellic, aaronliu0130, marcogmaia, sr-tream, codeinred, liftchampion, wkaisertexas, mcdoemer, quanzhuo, and 3 more reacted with thumbs up emojiNerixyz, lucobellic, pl-nexelec, marcogmaia, yksen, codeinred, liftchampion, wkaisertexas, mcdoemer, mccakit, and 3 more reacted with heart emoji
@github-actionsGitHub Actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using@ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by theLLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on theLLVM Discord or on theforums.

@llvmbot
Copy link
Member

llvmbot commentedMay 19, 2025
edited
Loading

@llvm/pr-subscribers-clangd

@llvm/pr-subscribers-clang-tools-extra

Author: None (tcottin)

Changes

This is a preparation for fixing clangd/clangd#529.

It changes the Markup rendering to markdown and plaintext.

  • Properly separate paragraphs using an empty line between
  • Dont escape markdown syntax for markdown output except for HTML
  • Dont do any formatting for markdown because the client is handling the actual markdown rendering

Patch is 41.40 KiB, truncated to 20.00 KiB below, full version:https://github.com/llvm/llvm-project/pull/140498.diff

7 Files Affected:

  • (modified) clang-tools-extra/clangd/Hover.cpp (+13-68)
  • (modified) clang-tools-extra/clangd/support/Markup.cpp (+147-105)
  • (modified) clang-tools-extra/clangd/support/Markup.h (+26-6)
  • (modified) clang-tools-extra/clangd/test/signature-help.test (+2-2)
  • (modified) clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp (+4-4)
  • (modified) clang-tools-extra/clangd/unittests/HoverTests.cpp (+51-24)
  • (modified) clang-tools-extra/clangd/unittests/support/MarkupTests.cpp (+167-47)
diff --git a/clang-tools-extra/clangd/Hover.cpp b/clang-tools-extra/clangd/Hover.cppindex 3ab3d89030520..88755733aa67c 100644--- a/clang-tools-extra/clangd/Hover.cpp+++ b/clang-tools-extra/clangd/Hover.cpp@@ -960,42 +960,6 @@ std::optional<HoverInfo> getHoverContents(const Attr *A, ParsedAST &AST) {   return HI; }-bool isParagraphBreak(llvm::StringRef Rest) {-  return Rest.ltrim(" \t").starts_with("\n");-}--bool punctuationIndicatesLineBreak(llvm::StringRef Line) {-  constexpr llvm::StringLiteral Punctuation = R"txt(.:,;!?)txt";--  Line = Line.rtrim();-  return !Line.empty() && Punctuation.contains(Line.back());-}--bool isHardLineBreakIndicator(llvm::StringRef Rest) {-  // '-'/'*' md list, '@'/'\' documentation command, '>' md blockquote,-  // '#' headings, '`' code blocks-  constexpr llvm::StringLiteral LinebreakIndicators = R"txt(-*@\>#`)txt";--  Rest = Rest.ltrim(" \t");-  if (Rest.empty())-    return false;--  if (LinebreakIndicators.contains(Rest.front()))-    return true;--  if (llvm::isDigit(Rest.front())) {-    llvm::StringRef AfterDigit = Rest.drop_while(llvm::isDigit);-    if (AfterDigit.starts_with(".") || AfterDigit.starts_with(")"))-      return true;-  }-  return false;-}--bool isHardLineBreakAfter(llvm::StringRef Line, llvm::StringRef Rest) {-  // Should we also consider whether Line is short?-  return punctuationIndicatesLineBreak(Line) || isHardLineBreakIndicator(Rest);-}- void addLayoutInfo(const NamedDecl &ND, HoverInfo &HI) {   if (ND.isInvalidDecl())     return;@@ -1601,51 +1565,32 @@ std::optional<llvm::StringRef> getBacktickQuoteRange(llvm::StringRef Line,   return Line.slice(Offset, Next + 1); }-void parseDocumentationLine(llvm::StringRef Line, markup::Paragraph &Out) {+void parseDocumentationParagraph(llvm::StringRef Text, markup::Paragraph &Out) {   // Probably this is appendText(Line), but scan for something interesting.-  for (unsigned I = 0; I < Line.size(); ++I) {-    switch (Line[I]) {+  for (unsigned I = 0; I < Text.size(); ++I) {+    switch (Text[I]) {     case '`':-      if (auto Range = getBacktickQuoteRange(Line, I)) {-        Out.appendText(Line.substr(0, I));+      if (auto Range = getBacktickQuoteRange(Text, I)) {+        Out.appendText(Text.substr(0, I));         Out.appendCode(Range->trim("`"), /*Preserve=*/true);-        return parseDocumentationLine(Line.substr(I + Range->size()), Out);+        return parseDocumentationParagraph(Text.substr(I + Range->size()), Out);       }       break;     }   }-  Out.appendText(Line).appendSpace();+  Out.appendText(Text); }  void parseDocumentation(llvm::StringRef Input, markup::Document &Output) {-  std::vector<llvm::StringRef> ParagraphLines;-  auto FlushParagraph = [&] {-    if (ParagraphLines.empty())-      return;-    auto &P = Output.addParagraph();-    for (llvm::StringRef Line : ParagraphLines)-      parseDocumentationLine(Line, P);-    ParagraphLines.clear();-  };+  llvm::StringRef Paragraph, Rest;+  for (std::tie(Paragraph, Rest) = Input.split("\n\n");+       !(Paragraph.empty() && Rest.empty());+       std::tie(Paragraph, Rest) = Rest.split("\n\n")) {-  llvm::StringRef Line, Rest;-  for (std::tie(Line, Rest) = Input.split('\n');-       !(Line.empty() && Rest.empty());-       std::tie(Line, Rest) = Rest.split('\n')) {--    // After a linebreak remove spaces to avoid 4 space markdown code blocks.-    // FIXME: make FlushParagraph handle this.-    Line = Line.ltrim();-    if (!Line.empty())-      ParagraphLines.push_back(Line);--    if (isParagraphBreak(Rest) || isHardLineBreakAfter(Line, Rest)) {-      FlushParagraph();-    }+    if (!Paragraph.empty())+      parseDocumentationParagraph(Paragraph, Output.addParagraph());   }-  FlushParagraph(); }- llvm::raw_ostream &operator<<(llvm::raw_ostream &OS,                               const HoverInfo::PrintedType &T) {   OS << T.Type;diff --git a/clang-tools-extra/clangd/support/Markup.cpp b/clang-tools-extra/clangd/support/Markup.cppindex 63aff96b02056..b1e6252e473f5 100644--- a/clang-tools-extra/clangd/support/Markup.cpp+++ b/clang-tools-extra/clangd/support/Markup.cpp@@ -11,7 +11,6 @@ #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h"-#include "llvm/Support/Compiler.h" #include "llvm/Support/raw_ostream.h" #include <cstddef> #include <iterator>@@ -56,80 +55,28 @@ bool looksLikeTag(llvm::StringRef Contents) {   return true; // Potentially incomplete tag. }-// Tests whether C should be backslash-escaped in markdown.-// The string being escaped is Before + C + After. This is part of a paragraph.-// StartsLine indicates whether `Before` is the start of the line.-// After may not be everything until the end of the line.-//-// It's always safe to escape punctuation, but want minimal escaping.-// The strategy is to escape the first character of anything that might start-// a markdown grammar construct.-bool needsLeadingEscape(char C, llvm::StringRef Before, llvm::StringRef After,-                        bool StartsLine) {-  assert(Before.take_while(llvm::isSpace).empty());-  auto RulerLength = [&]() -> /*Length*/ unsigned {-    if (!StartsLine || !Before.empty())-      return false;-    llvm::StringRef A = After.rtrim();-    return llvm::all_of(A, [C](char D) { return C == D; }) ? 1 + A.size() : 0;-  };-  auto IsBullet = [&]() {-    return StartsLine && Before.empty() &&-           (After.empty() || After.starts_with(" "));-  };-  auto SpaceSurrounds = [&]() {-    return (After.empty() || llvm::isSpace(After.front())) &&-           (Before.empty() || llvm::isSpace(Before.back()));-  };-  auto WordSurrounds = [&]() {-    return (!After.empty() && llvm::isAlnum(After.front())) &&-           (!Before.empty() && llvm::isAlnum(Before.back()));-  };-+/// \brief Tests whether \p C should be backslash-escaped in markdown.+///+/// The MarkupContent LSP specification defines that `markdown` content needs to+/// follow GFM (GitHub Flavored Markdown) rules. And we can assume that markdown+/// is rendered on the client side. This means we do not need to escape any+/// markdown constructs.+/// The only exception is when the client does not support HTML rendering in+/// markdown. In that case, we need to escape HTML tags and HTML entities.+///+/// **FIXME:** handle the case when the client does support HTML rendering in+/// markdown. For this, the LSP server needs to check the+/// [supportsHtml capability](https://github.com/microsoft/language-server-protocol/issues/1344)+/// of the client.+///+/// \param C The character to check.+/// \param After The string that follows \p C . This is used to determine if \p C is+///             part of a tag or an entity reference.+/// \returns true if \p C should be escaped, false otherwise.+bool needsLeadingEscape(char C, llvm::StringRef After) {   switch (C) {-  case '\\': // Escaped character.-    return true;-  case '`': // Code block or inline code-    // Any number of backticks can delimit an inline code block that can end-    // anywhere (including on another line). We must escape them all.-    return true;-  case '~': // Code block-    return StartsLine && Before.empty() && After.starts_with("~~");-  case '#': { // ATX heading.-    if (!StartsLine || !Before.empty())-      return false;-    llvm::StringRef Rest = After.ltrim(C);-    return Rest.empty() || Rest.starts_with(" ");-  }-  case ']': // Link or link reference.-    // We escape ] rather than [ here, because it's more constrained:-    //   ](...) is an in-line link-    //   ]: is a link reference-    // The following are only links if the link reference exists:-    //   ] by itself is a shortcut link-    //   ][...] is an out-of-line link-    // Because we never emit link references, we don't need to handle these.-    return After.starts_with(":") || After.starts_with("(");-  case '=': // Setex heading.-    return RulerLength() > 0;-  case '_': // Horizontal ruler or matched delimiter.-    if (RulerLength() >= 3)-      return true;-    // Not a delimiter if surrounded by space, or inside a word.-    // (The rules at word boundaries are subtle).-    return !(SpaceSurrounds() || WordSurrounds());-  case '-': // Setex heading, horizontal ruler, or bullet.-    if (RulerLength() > 0)-      return true;-    return IsBullet();-  case '+': // Bullet list.-    return IsBullet();-  case '*': // Bullet list, horizontal ruler, or delimiter.-    return IsBullet() || RulerLength() >= 3 || !SpaceSurrounds();   case '<': // HTML tag (or autolink, which we choose not to escape)     return looksLikeTag(After);-  case '>': // Quote marker. Needs escaping at start of line.-    return StartsLine && Before.empty();   case '&': { // HTML entity reference     auto End = After.find(';');     if (End == llvm::StringRef::npos)@@ -142,10 +89,6 @@ bool needsLeadingEscape(char C, llvm::StringRef Before, llvm::StringRef After,     }     return llvm::all_of(Content, llvm::isAlpha);   }-  case '.': // Numbered list indicator. Escape 12. -> 12\. at start of line.-  case ')':-    return StartsLine && !Before.empty() &&-           llvm::all_of(Before, llvm::isDigit) && After.starts_with(" ");   default:     return false;   }@@ -156,8 +99,7 @@ bool needsLeadingEscape(char C, llvm::StringRef Before, llvm::StringRef After, std::string renderText(llvm::StringRef Input, bool StartsLine) {   std::string R;   for (unsigned I = 0; I < Input.size(); ++I) {-    if (needsLeadingEscape(Input[I], Input.substr(0, I), Input.substr(I + 1),-                           StartsLine))+    if (needsLeadingEscape(Input[I], Input.substr(I + 1)))       R.push_back('\\');     R.push_back(Input[I]);   }@@ -303,11 +245,12 @@ class CodeBlock : public Block { std::string indentLines(llvm::StringRef Input) {   assert(!Input.ends_with("\n") && "Input should've been trimmed.");   std::string IndentedR;-  // We'll add 2 spaces after each new line.+  // We'll add 2 spaces after each new line which is not followed by another new line.   IndentedR.reserve(Input.size() + Input.count('\n') * 2);-  for (char C : Input) {+  for (size_t I = 0; I < Input.size(); ++I) {+    char C = Input[I];     IndentedR += C;-    if (C == '\n')+    if (C == '\n' && (((I + 1) < Input.size()) && (Input[I + 1] != '\n')))       IndentedR.append("  ");   }   return IndentedR;@@ -348,20 +291,24 @@ void Paragraph::renderMarkdown(llvm::raw_ostream &OS) const {     if (C.SpaceBefore || NeedsSpace)       OS << " ";     switch (C.Kind) {-    case Chunk::PlainText:+    case ChunkKind::PlainText:       OS << renderText(C.Contents, !HasChunks);       break;-    case Chunk::InlineCode:+    case ChunkKind::InlineCode:       OS << renderInlineBlock(C.Contents);       break;+    case ChunkKind::Bold:+      OS << "**" << renderText(C.Contents, !HasChunks) << "**";+      break;+    case ChunkKind::Emphasized:+      OS << "*" << renderText(C.Contents, !HasChunks) << "*";+      break;     }     HasChunks = true;     NeedsSpace = C.SpaceAfter;   }-  // Paragraphs are translated into markdown lines, not markdown paragraphs.-  // Therefore it only has a single linebreak afterwards.-  // VSCode requires two spaces at the end of line to start a new one.-  OS << "  \n";+  // A paragraph in markdown is separated by a blank line.+  OS << "\n\n"; }  std::unique_ptr<Block> Paragraph::clone() const {@@ -370,8 +317,8 @@ std::unique_ptr<Block> Paragraph::clone() const {  /// Choose a marker to delimit `Text` from a prioritized list of options. /// This is more readable than escaping for plain-text.-llvm::StringRef chooseMarker(llvm::ArrayRef<llvm::StringRef> Options,-                             llvm::StringRef Text) {+llvm::StringRef Paragraph::chooseMarker(llvm::ArrayRef<llvm::StringRef> Options,+                                        llvm::StringRef Text) const {   // Prefer a delimiter whose characters don't appear in the text.   for (llvm::StringRef S : Options)     if (Text.find_first_of(S) == llvm::StringRef::npos)@@ -379,18 +326,94 @@ llvm::StringRef chooseMarker(llvm::ArrayRef<llvm::StringRef> Options,   return Options.front(); }+bool Paragraph::punctuationIndicatesLineBreak(llvm::StringRef Line) const{+  constexpr llvm::StringLiteral Punctuation = R"txt(.:,;!?)txt";++  Line = Line.rtrim();+  return !Line.empty() && Punctuation.contains(Line.back());+}++bool Paragraph::isHardLineBreakIndicator(llvm::StringRef Rest) const {+  // '-'/'*' md list, '@'/'\' documentation command, '>' md blockquote,+  // '#' headings, '`' code blocks, two spaces (markdown force newline)+  constexpr llvm::StringLiteral LinebreakIndicators = R"txt(-*@\>#`)txt";++  Rest = Rest.ltrim(" \t");+  if (Rest.empty())+    return false;++  if (LinebreakIndicators.contains(Rest.front()))+    return true;++  if (llvm::isDigit(Rest.front())) {+    llvm::StringRef AfterDigit = Rest.drop_while(llvm::isDigit);+    if (AfterDigit.starts_with(".") || AfterDigit.starts_with(")"))+      return true;+  }+  return false;+}++bool Paragraph::isHardLineBreakAfter(llvm::StringRef Line,+                                     llvm::StringRef Rest) const {+  // In Markdown, 2 spaces before a line break forces a line break.+  // Add a line break for plaintext in this case too.+  // Should we also consider whether Line is short?+  return Line.ends_with("  ") || punctuationIndicatesLineBreak(Line) ||+         isHardLineBreakIndicator(Rest);+}+ void Paragraph::renderPlainText(llvm::raw_ostream &OS) const {   bool NeedsSpace = false;+  std::string ConcatenatedText;+  llvm::raw_string_ostream ConcatenatedOS(ConcatenatedText);+   for (auto &C : Chunks) {++    if (C.Kind == ChunkKind::PlainText) {+      if (C.SpaceBefore || NeedsSpace)+        ConcatenatedOS << ' ';++      ConcatenatedOS << C.Contents;+      NeedsSpace = llvm::isSpace(C.Contents.back()) || C.SpaceAfter;+      continue;+    }+     if (C.SpaceBefore || NeedsSpace)-      OS << " ";+      ConcatenatedOS << ' ';     llvm::StringRef Marker = "";-    if (C.Preserve && C.Kind == Chunk::InlineCode)+    if (C.Preserve && C.Kind == ChunkKind::InlineCode)       Marker = chooseMarker({"`", "'", "\""}, C.Contents);-    OS << Marker << C.Contents << Marker;+    else if (C.Kind == ChunkKind::Bold)+      Marker = "**";+    else if (C.Kind == ChunkKind::Emphasized)+      Marker = "*";+    ConcatenatedOS << Marker << C.Contents << Marker;     NeedsSpace = C.SpaceAfter;   }-  OS << '\n';++  // We go through the contents line by line to handle the newlines+  // and required spacing correctly.+  llvm::StringRef Line, Rest;++  for (std::tie(Line, Rest) =+           llvm::StringRef(ConcatenatedText).trim().split('\n');+       !(Line.empty() && Rest.empty());+       std::tie(Line, Rest) = Rest.split('\n')) {++    Line = Line.ltrim();+    if (Line.empty())+      continue;++    OS << canonicalizeSpaces(Line);++    if (isHardLineBreakAfter(Line, Rest))+      OS << '\n';+    else if (!Rest.empty())+      OS << ' ';+  }++  // Paragraphs are separated by a blank line.+  OS << "\n\n"; }  BulletList::BulletList() = default;@@ -398,12 +421,13 @@ BulletList::~BulletList() = default;  void BulletList::renderMarkdown(llvm::raw_ostream &OS) const {   for (auto &D : Items) {+    std::string M = D.asMarkdown();     // Instead of doing this we might prefer passing Indent to children to get     // rid of the copies, if it turns out to be a bottleneck.-    OS << "- " << indentLines(D.asMarkdown()) << '\n';+    OS << "- " << indentLines(M) << '\n';   }   // We need a new line after list to terminate it in markdown.-  OS << '\n';+  OS << "\n\n"; }  void BulletList::renderPlainText(llvm::raw_ostream &OS) const {@@ -412,6 +436,7 @@ void BulletList::renderPlainText(llvm::raw_ostream &OS) const {     // rid of the copies, if it turns out to be a bottleneck.     OS << "- " << indentLines(D.asPlainText()) << '\n';   }+  OS << '\n'; }  Paragraph &Paragraph::appendSpace() {@@ -420,29 +445,44 @@ Paragraph &Paragraph::appendSpace() {   return *this; }-Paragraph &Paragraph::appendText(llvm::StringRef Text) {-  std::string Norm = canonicalizeSpaces(Text);-  if (Norm.empty())+Paragraph &Paragraph::appendChunk(llvm::StringRef Contents, ChunkKind K) {+  if (Contents.empty())     return *this;   Chunks.emplace_back();   Chunk &C = Chunks.back();-  C.Contents = std::move(Norm);-  C.Kind = Chunk::PlainText;-  C.SpaceBefore = llvm::isSpace(Text.front());-  C.SpaceAfter = llvm::isSpace(Text.back());+  C.Contents = std::move(Contents);+  C.Kind = K;   return *this; }+Paragraph &Paragraph::appendText(llvm::StringRef Text) {+  if (!Chunks.empty() && Chunks.back().Kind == ChunkKind::PlainText) {+    Chunks.back().Contents += std::move(Text);+    return *this;+  }++  return appendChunk(Text, ChunkKind::PlainText);+}++Paragraph &Paragraph::appendEmphasizedText(llvm::StringRef Text) {+  return appendChunk(canonicalizeSpaces(std::move(Text)),+                     ChunkKind::Emphasized);+}++Paragraph &Paragraph::appendBoldText(llvm::StringRef Text) {+  return appendChunk(canonicalizeSpaces(std::move(Text)), ChunkKind::Bold);+}+ Paragraph &Paragraph::appendCode(llvm::StringRef Code, bool Preserve) {   bool AdjacentCode =-      !Chunks.empty() && Chunks.back().Kind == Chunk::InlineCode;+      !Chunks.empty() && Chunks.back().Kind == ChunkKind::InlineCode;   std::string Norm = canonicalizeSpaces(std::move(Code));   if (Norm.empty())     return *this;   Chunks.emplace_back();   Chunk &C = Chunks.back();   C.Contents = std::move(Norm);-  C.Kind = Chunk::InlineCode;+  C.Kind = ChunkKind::InlineCode;   C.Preserve = Preserve;   // Disallow adjacent code spans without spaces, markdown can't render them.   C.SpaceBefore = AdjacentCode;@@ -475,7 +515,9 @@ Paragraph &Document::addParagraph() {   return *static_cast<Paragraph *>(Children.back().get()); }-void Document::addRuler() { Children.push_back(std::make_unique<Ruler>()); }+void Document::addRuler() {+  Children.push_back(std::make_unique<Ruler>());+}  void Document::addCodeBlock(std::string Code, std::string Language) {   Children.emplace_back(diff --git a/clang-tools-extra/clangd/support/Markup.h b/clang-tools-extra/clangd/support/Markup.hindex 3a869c49a2cbb..a74fade13d115 100644--- a/clang-tools-extra/clangd/support/Markup.h+++ b/clang-tools-extra/clangd/support/Markup.h@@ -49,6 +49,12 @@ class Paragraph : public Block {   /// Append plain text to the end of the string.   Paragraph &appendText(llvm::StringRef Text);+  /// Append emphasized text, this translates to the * block in markdown.+  Paragraph &appendEmphasizedText(llvm::StringRef Text);++  /// Append bold text, this translates to the ** block in markdown.+  Paragraph &appendBoldText(llvm::StringRef Text);+   /// Append inline code, this translates to the ` block in markdown.   /// \p Preserve indicates the code span must be apparent even in plaintext.   Paragraph &appendCode(llvm::StringRef Code, bool Preserve = false);@@ -58,11 +64,9 @@ class Paragraph : public Block {   Paragraph &appendSpace();  private:+  typedef enum { PlainText, InlineCode, Bold, Emphasized } ChunkKind;   struct Chunk {-    enum {-      PlainText,-      InlineCode,-    } Kind = PlainText;+    ChunkKind Kind = PlainText;     // Preserve chunk markers in plaintext.     bool Preserve = false;     std::string Contents;@@ -73,6 +77,19 @@ class Paragraph : public Block {     bool SpaceAfter = false;   };   std::vector<Chunk> Chunks;++  Paragraph &appendChunk(llvm::StringRef Contents, ChunkKind K);++  llvm::StringRef chooseMarker(llvm::ArrayRef<llvm::StringRef> Options,+                               llvm::StringRef Text) const;+  bool punctuationIndicatesLineBreak(llvm::StringRef Line) const;+  bool isHardLineBreakIndicator(llvm::StringRef Rest) const;+  bool isHardLineBreakAfter(llvm::StringRef Line, llvm::StringRef Rest) const;+};++class ListItemParagraph : public Paragraph {+public:+  void renderMarkdown(llvm::raw_ostream &OS) const override; };  /// Represents a sequence of one or more documents. Knows how to print them in a@@ -82,6 +99,9 @@ class BulletList : public Block {   BulletList();...[truncated]

@tcottin
Copy link
Author

@github-actionsGitHub Actions
Copy link

github-actionsbot commentedMay 30, 2025
edited
Loading

✅ With the latest revision this PR passed the C/C++ code formatter.

@HighCommander4
Copy link
Collaborator

Carrying over review request to@kadircet from the original combined patch at#128591 (comment):

@kadircet, you were involved with earlier rounds of review on the old patch (https://reviews.llvm.org/D143112) and design discussions in the issue (clangd/clangd#529) -- do you by chance have the cycles to pick up this review?

@@ -118,8 +138,8 @@ class Document {
BulletList &addBulletList();

/// Doesn't contain any trailing newlines.
/// We try to make the markdown human-readable, e.g. avoid extra escaping.
/// At least one client (coc.nvim) displays the markdown verbatim!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This deleted comment refers tococ.nvims rendering of markdown - would this PR cause a change of behavior for that use case? If I understand correctly, the prior work intentionally targeted clients, like this, with limited markdown rendering support.

If that's the case and we somehow need to find a balance between Markdown capabilities of different clients, could theMarkdownClientCapabilities flags be used to guard the necessary logic?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This patch changes 2 things for markdown:

  • The documentation string is taken as is without escaping anything (except for html tags and entity references)
  • Paragraphs are separated with an empty line

My idea was to let the client handle the markdown rendering completely.
Therefore no escaping is needed and even false because if markdown annotations are escaped, the client will not render it as expected.
The same for the empty line between paragraphs: this is how paragraphs are separated in markdown.

I think it is save to assume that the client can handle unescaped markdown.
According to theLSP specification,
if a clients capability is specifying'markdown' asMarkupKind for itMarkupContent, the "content should follow the GitHub Flavored Markdown Specification".

Now regardingcoc.nvim: I dont use it and do not know the current behaviour.
Reading through e.g.clangd/coc-clangd#48 though, I understood thatcoc.nvim requestsMarkupContent with'markdown' but does not really render markdown.
Hence I would assume that in this case leaving out the escaping would result in just not showing the escaping character.
Regarding the empty newlines between paragraphs I would also expect that the empty line is just shown, creating better visual separations between paragraphs

But:@emaxx-google do you usecoc.nvim and could you give this patch a try?
Maybe my assumptions are false and it causes issues I did not see yet.

Regarding the capabilities: I think currently there is only'plaintext' and'markdown'.
Additionally there is the option to allow HTML tags when markdown is selected.
Other than that I dont know how we could handle this.
But I think we don't need to actually, because as stated above, if the client specifies to support markdown, it should handle it correctly.
If it can't handle it,'plaintext' is always an option.

Copy link
Contributor

@emaxx-googleemaxx-googleJun 3, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If I understand correctly, this PR assumes that the input documentation string is written using Markdown. The current implementation in Clangd, however, is much more conservative in its assumptions - it only recognizes basic syntax features like backticks. Hence when rendering formarkdown, everything else is escaped because the assumption is it's actually a plain text, so that all characters that'd (accidentally?) trigger special semantics in the Markdown client-side renderer are mangled to be not treated as such:

// Tests whether C should be backslash-escaped in markdown.<...>// It's always safe to escape punctuation, but want minimal escaping.// The strategy is to escape the first character of anything that might start// a markdown grammar construct.

A change to pass documentation strings as-is would cause regressions on code bases where comments aren't written with the markdown syntax in mind.

P.S. As for coc-nvim - I personally don't use it, but if we're to do radical changes to how things are output, it's worth it checking we don't regress the UI for this known case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the reply. I did not think about it this way.
The issue that you dont want your documentation to be rendered as markdown by the client seems different from the coc.nvim issue.

The old behaviour was to treat all documentation as non-markdown independent of client render capabilities.
Hence to even make this work for clients requesting markdown as the MarkupKind, markdown syntax in the documentation string was escaped.
This way, only information which was added by clangd (like the declaration of the hover point) is rendered in markdown, if the client supports it.

With this change, the documentation is taken as-is which leads to rendered markdown on clients requesting markdown.

If viewed this way, the escaping behaviour is actually not a client decision and independent of the clients capabilities.
Therefore we could introduce a server option to control the escaping behaviour I think.

Just for reference:doxygen supports markdown since 1.8.0. There is aMARKDOWN_SUPPORT configuration option which is set toYES by default.
This means in a doxygen comment, all markdown syntax will be rendered as markdown in the final documentation by default.
Hence the default doxygen behaviour is the same as the behaviour proposed with this change.

I would argue that for the markdown rendering behaviour of clangd it should be the same:
the proposed behaviour should be the default but could be switched off with a server option.

chriselrod reacted with thumbs up emoji
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think we should distinguish between the input documentation's syntax and the client's rendering capabilities. The latter defines the syntax in which it expects the data received from the server, and not that all input is forced to be treated as if written in Markdown. As said, we're suspecting that source comments in a lot of projects would be rendered incorrectly if they'd be all interpreted as verbatim Markdown as a blanket decision.

So guarding this "no escaping" behavior by an option is one possibility, yes. Another possible direction could be to decide based on the syntax used in the comments - i.e., recognizing those distinctive comment syntaxes fromhttps://www.doxygen.nl/manual/docblocks.html ? (This latter option would be nice as it'd provide reasonable behavior by default, although if necessary it could still be combined with a server option.)

Copy link
Contributor

@emaxx-googleemaxx-googleJun 24, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sounds good generally! Thanks.

Just a quick question:

Plaintext (default): the old escaping behaviour and no doxygen command parsing. The only thing I would still change is the paragraph termination with 2 newlines instead of 1.

A single blank line between paragraphs sounds OK. Or do you mean adding another blank line in between, or a trailing blank line at the end? - that might look extraneous. In any case, that sounds like a relatively minor implementation detail that we can discuss further as the PR goes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A single blank line between paragraphs sounds OK. Or do you mean adding another blank line in between, or a trailing blank line at the end? - that might look extraneous. In any case, that sounds like a relatively minor implementation detail that we can discuss further as the PR goes.

I mean in between.

Multiple newlines in a comment used to be "collapsed" to a single newline, resulting in removed paragraph separation in Markdown:

/** * paragraph1 line1 * paragraph1 line2 * * paragraph2 line1 */

Hover looked like this:

paragraph1 line1 paragraph1 line2paragraph2 line1

With the change, paragraphs are now seperated by an empty line (as per Markdown specification).

paragraph1 line1 paragraph1 line2paragraph2 line1

Trailing newlines are always trimmed. This did not change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Preserving the blank line between paragraphs sounds reasonable. Thanks for giving the examples.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am not sure about the provided design.
I actually wanted to use the new config from themarkup::Document class directly.
But the class is in theclangdSupport lib which does not have access to the config apparently.

Therefore I added a newasEscapedMarkdown function to the class.
This allows the users to determine what is needed based on the config and the client capabilities.

I fixed the missed formatting, but force pushed...
I think this canceled the approval of the CI again.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I realized that there were some things missing.

Whitespace is preserved now, therefore also the escaping implementation needed some additional changes.

Note: the whitespace preservation should not affect the client rendering.
Whitespace used to be canonicalized which leads to the markdown string already having spacing as the markdown renderer would show it on the client anyway.

Now whitespace is preserved but the client requests Markdown, hence it should be shown correctly.

@tcottin
Copy link
Author

ping

@mccakit
Copy link

ping@kadircet

@aaronliu0130
Copy link

@mccakit It looks like there's currently a reply from@tcottin to emaxx-google's last comment needed.

@mccakit
Copy link

Yeah, I saw it. Hope they look into it. This PR is like 2 months old, doxygen parsing at clang have some patches I found online that date back to 2019.

I'm currently using tcottins PR branch right now, because that is the only way of getting doxygen comments on my editor, which is Zed.

@tcottintcottinforce-pushed theimprove-markup-rendering branch frombf5a925 to39b891dCompareJuly 1, 2025 17:32
@tcottin
Copy link
Author

ping@emaxx-google

Copy link
Contributor

@emaxx-googleemaxx-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks! A couple of questions about newline handling, and a few minor nits.

///
/// \param C The character to check.
/// \param After The string that follows \p C .
// This is used to determine if \p C is part of a tag or an entity reference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: Use triple-slash here as well, for consistency with the rest of the comment.

tcottin reacted with thumbs up emoji
return needsLeadingEscapePlaintext(C, Before, After, StartsLine);
return needsLeadingEscapeMarkdown(C, After);
}

/// Escape a markdown text block. Ensures the punctuation will not introduce
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: Update the comment (e.g., the second sentence is only applicable whenEscapeMarkdown istrue if I understand correctly).

tcottin reacted with thumbs up emoji
// split the input into lines, and escape each line separately.
llvm::StringRef Line, Rest;

bool StartsLineIntern = StartsLine;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: Move the variable declaration below, to the place where it's initialized.

tcottin reacted with thumbs up emoji
!(Line.empty() && Rest.empty());
std::tie(Line, Rest) = Rest.split('\n')) {
llvm::StringRef Paragraph, Rest;
for (std::tie(Paragraph, Rest) = Input.split("\n\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Will this always work as expected when there's an even number of line breaks between paragraphs? It's not obvious from the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes it will still work.
In case of an even number of line breaks > 2 the if condition in the loop will evaluate to false and the additional empty lines will be ignored.
I added some comments about the logic and some new test cases to show it.

/// Escape a markdown text block. Ensures the punctuation will not introduce
/// any of the markdown constructs.
std::string renderText(llvm::StringRef Input, bool StartsLine) {
std::string renderText(llvm::StringRef Input, bool StartsLine,
bool EscapeMarkdown = false) {
std::string R;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

While we're here, it'd probably help doingR.reserve(Input.size())?

tcottin reacted with thumbs up emoji
// rid of the copies, if it turns out to be a bottleneck.
OS << "- " << indentLines(M) << '\n';
}
// We need a new line after list to terminate it in markdown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: The comment says "a" new line, but the code adds two.

tcottin reacted with thumbs up emoji
C.Kind = Chunk::PlainText;
C.SpaceBefore = llvm::isSpace(Text.front());
C.SpaceAfter = llvm::isSpace(Text.back());
C.Contents = std::move(Contents);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit:std::move() isn't really making a difference here, when anstd::string is allocated anyway in order to copy the contents ofStringRef.

Ditto in other places before that applystd::move() toStringRef.

tcottin reacted with thumbs up emoji
@@ -58,11 +67,9 @@ class Paragraph : public Block {
Paragraph &appendSpace();

private:
typedef enum { PlainText, InlineCode, Bold, Emphasized } ChunkKind;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: Just doenum ChunkKind { ... }.

tcottin reacted with thumbs up emoji
bool isHardLineBreakAfter(llvm::StringRef Line, llvm::StringRef Rest) const;
};

class ListItemParagraph : public Paragraph {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: Unused?

tcottin reacted with thumbs up emoji
- foo
- baz
- foo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Related to the comment in the code - these additional blank lines within lists feel somewhat excessive. Is there a reason we have to add them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes.

We have to add them because we now end a bullet list with 2 new lines to have proper separation of bullet lists and normal paragraphs.
And I could not find a solution to do it differently.
We would need to know before we finish a bullet list if another bullet list follows.
And if another follows only add a single new line.
At least thats what I think we would need to solve.

Maybe you have another suggestion/idea?

Besides that, I dont think it is a big issue:

  1. It is only an issue if nested bullet lists are defined by a user outside of the documentation parsing.
    Currently, I did not see nested bullet lists used outside of this test.

  2. For the documentation markup, nested bullet lists are also not used.
    And for the future doxygen parsing,nested bullet lists are also not supported.

  3. This does not affect lists which are written as markdown in the documentation.

@tcottin
Copy link
Author

Thanks for the review!

I pushed some fixes.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@kadircetkadircetAwaiting requested review from kadircet

@emaxx-googleemaxx-googleAwaiting requested review from emaxx-google

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

6 participants
@tcottin@llvmbot@HighCommander4@mccakit@aaronliu0130@emaxx-google

[8]ページ先頭

©2009-2025 Movatter.jp