Properly separate paragraphs using an empty line between
Dont escape markdown syntax for markdown output except for HTML
Dont do any formatting for markdown because the client is handling the actual markdown rendering

[clangd] Improve Markup Rendering

8fadd8d

Copy link

github-actionsbot commentedMay 19, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using@ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by theLLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on theLLVM Discord or on theforums.

llvmbot added clang-tools-extra clangd labels

May 19, 2025

Copy link

Member

llvmbot commentedMay 19, 2025•
edited
Loading

@llvm/pr-subscribers-clangd

@llvm/pr-subscribers-clang-tools-extra

Author: None (tcottin)

Changes

This is a preparation for fixing clangd/clangd#529.

It changes the Markup rendering to markdown and plaintext.

Properly separate paragraphs using an empty line between
Dont escape markdown syntax for markdown output except for HTML
Dont do any formatting for markdown because the client is handling the actual markdown rendering

Patch is 41.40 KiB, truncated to 20.00 KiB below, full version:https://github.com/llvm/llvm-project/pull/140498.diff

7 Files Affected:

(modified) clang-tools-extra/clangd/Hover.cpp (+13-68)
(modified) clang-tools-extra/clangd/support/Markup.cpp (+147-105)
(modified) clang-tools-extra/clangd/support/Markup.h (+26-6)
(modified) clang-tools-extra/clangd/test/signature-help.test (+2-2)
(modified) clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp (+4-4)
(modified) clang-tools-extra/clangd/unittests/HoverTests.cpp (+51-24)
(modified) clang-tools-extra/clangd/unittests/support/MarkupTests.cpp (+167-47)

diff --git a/clang-tools-extra/clangd/Hover.cpp b/clang-tools-extra/clangd/Hover.cppindex 3ab3d89030520..88755733aa67c 100644--- a/clang-tools-extra/clangd/Hover.cpp+++ b/clang-tools-extra/clangd/Hover.cpp@@ -960,42 +960,6 @@ std::optional<HoverInfo> getHoverContents(const Attr *A, ParsedAST &AST) {   return HI; }-bool isParagraphBreak(llvm::StringRef Rest) {-  return Rest.ltrim(" \t").starts_with("\n");-}--bool punctuationIndicatesLineBreak(llvm::StringRef Line) {-  constexpr llvm::StringLiteral Punctuation = R"txt(.:,;!?)txt";--  Line = Line.rtrim();-  return !Line.empty() && Punctuation.contains(Line.back());-}--bool isHardLineBreakIndicator(llvm::StringRef Rest) {-  // '-'/'*' md list, '@'/'\' documentation command, '>' md blockquote,-  // '#' headings, '`' code blocks-  constexpr llvm::StringLiteral LinebreakIndicators = R"txt(-*@\>#`)txt";--  Rest = Rest.ltrim(" \t");-  if (Rest.empty())-    return false;--  if (LinebreakIndicators.contains(Rest.front()))-    return true;--  if (llvm::isDigit(Rest.front())) {-    llvm::StringRef AfterDigit = Rest.drop_while(llvm::isDigit);-    if (AfterDigit.starts_with(".") || AfterDigit.starts_with(")"))-      return true;-  }-  return false;-}--bool isHardLineBreakAfter(llvm::StringRef Line, llvm::StringRef Rest) {-  // Should we also consider whether Line is short?-  return punctuationIndicatesLineBreak(Line) || isHardLineBreakIndicator(Rest);-}- void addLayoutInfo(const NamedDecl &ND, HoverInfo &HI) {   if (ND.isInvalidDecl())     return;@@ -1601,51 +1565,32 @@ std::optional<llvm::StringRef> getBacktickQuoteRange(llvm::StringRef Line,   return Line.slice(Offset, Next + 1); }-void parseDocumentationLine(llvm::StringRef Line, markup::Paragraph &Out) {+void parseDocumentationParagraph(llvm::StringRef Text, markup::Paragraph &Out) {   // Probably this is appendText(Line), but scan for something interesting.-  for (unsigned I = 0; I < Line.size(); ++I) {-    switch (Line[I]) {+  for (unsigned I = 0; I < Text.size(); ++I) {+    switch (Text[I]) {     case '`':-      if (auto Range = getBacktickQuoteRange(Line, I)) {-        Out.appendText(Line.substr(0, I));+      if (auto Range = getBacktickQuoteRange(Text, I)) {+        Out.appendText(Text.substr(0, I));         Out.appendCode(Range->trim("`"), /*Preserve=*/true);-        return parseDocumentationLine(Line.substr(I + Range->size()), Out);+        return parseDocumentationParagraph(Text.substr(I + Range->size()), Out);       }       break;     }   }-  Out.appendText(Line).appendSpace();+  Out.appendText(Text); }  void parseDocumentation(llvm::StringRef Input, markup::Document &Output) {-  std::vector<llvm::StringRef> ParagraphLines;-  auto FlushParagraph = [&] {-    if (ParagraphLines.empty())-      return;-    auto &P = Output.addParagraph();-    for (llvm::StringRef Line : ParagraphLines)-      parseDocumentationLine(Line, P);-    ParagraphLines.clear();-  };+  llvm::StringRef Paragraph, Rest;+  for (std::tie(Paragraph, Rest) = Input.split("\n\n");+       !(Paragraph.empty() && Rest.empty());+       std::tie(Paragraph, Rest) = Rest.split("\n\n")) {-  llvm::StringRef Line, Rest;-  for (std::tie(Line, Rest) = Input.split('\n');-       !(Line.empty() && Rest.empty());-       std::tie(Line, Rest) = Rest.split('\n')) {--    // After a linebreak remove spaces to avoid 4 space markdown code blocks.-    // FIXME: make FlushParagraph handle this.-    Line = Line.ltrim();-    if (!Line.empty())-      ParagraphLines.push_back(Line);--    if (isParagraphBreak(Rest) || isHardLineBreakAfter(Line, Rest)) {-      FlushParagraph();-    }+    if (!Paragraph.empty())+      parseDocumentationParagraph(Paragraph, Output.addParagraph());   }-  FlushParagraph(); }- llvm::raw_ostream &operator<<(llvm::raw_ostream &OS,                               const HoverInfo::PrintedType &T) {   OS << T.Type;diff --git a/clang-tools-extra/clangd/support/Markup.cpp b/clang-tools-extra/clangd/support/Markup.cppindex 63aff96b02056..b1e6252e473f5 100644--- a/clang-tools-extra/clangd/support/Markup.cpp+++ b/clang-tools-extra/clangd/support/Markup.cpp@@ -11,7 +11,6 @@ #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h"-#include "llvm/Support/Compiler.h" #include "llvm/Support/raw_ostream.h" #include <cstddef> #include <iterator>@@ -56,80 +55,28 @@ bool looksLikeTag(llvm::StringRef Contents) {   return true; // Potentially incomplete tag. }-// Tests whether C should be backslash-escaped in markdown.-// The string being escaped is Before + C + After. This is part of a paragraph.-// StartsLine indicates whether `Before` is the start of the line.-// After may not be everything until the end of the line.-//-// It's always safe to escape punctuation, but want minimal escaping.-// The strategy is to escape the first character of anything that might start-// a markdown grammar construct.-bool needsLeadingEscape(char C, llvm::StringRef Before, llvm::StringRef After,-                        bool StartsLine) {-  assert(Before.take_while(llvm::isSpace).empty());-  auto RulerLength = [&]() -> /*Length*/ unsigned {-    if (!StartsLine || !Before.empty())-      return false;-    llvm::StringRef A = After.rtrim();-    return llvm::all_of(A, [C](char D) { return C == D; }) ? 1 + A.size() : 0;-  };-  auto IsBullet = [&]() {-    return StartsLine && Before.empty() &&-           (After.empty() || After.starts_with(" "));-  };-  auto SpaceSurrounds = [&]() {-    return (After.empty() || llvm::isSpace(After.front())) &&-           (Before.empty() || llvm::isSpace(Before.back()));-  };-  auto WordSurrounds = [&]() {-    return (!After.empty() && llvm::isAlnum(After.front())) &&-           (!Before.empty() && llvm::isAlnum(Before.back()));-  };-+/// \brief Tests whether \p C should be backslash-escaped in markdown.+///+/// The MarkupContent LSP specification defines that `markdown` content needs to+/// follow GFM (GitHub Flavored Markdown) rules. And we can assume that markdown+/// is rendered on the client side. This means we do not need to escape any+/// markdown constructs.+/// The only exception is when the client does not support HTML rendering in+/// markdown. In that case, we need to escape HTML tags and HTML entities.+///+/// **FIXME:** handle the case when the client does support HTML rendering in+/// markdown. For this, the LSP server needs to check the+/// [supportsHtml capability](https://github.com/microsoft/language-server-protocol/issues/1344)+/// of the client.+///+/// \param C The character to check.+/// \param After The string that follows \p C . This is used to determine if \p C is+///             part of a tag or an entity reference.+/// \returns true if \p C should be escaped, false otherwise.+bool needsLeadingEscape(char C, llvm::StringRef After) {   switch (C) {-  case '\\': // Escaped character.-    return true;-  case '`': // Code block or inline code-    // Any number of backticks can delimit an inline code block that can end-    // anywhere (including on another line). We must escape them all.-    return true;-  case '~': // Code block-    return StartsLine && Before.empty() && After.starts_with("~~");-  case '#': { // ATX heading.-    if (!StartsLine || !Before.empty())-      return false;-    llvm::StringRef Rest = After.ltrim(C);-    return Rest.empty() || Rest.starts_with(" ");-  }-  case ']': // Link or link reference.-    // We escape ] rather than [ here, because it's more constrained:-    //   ](...) is an in-line link-    //   ]: is a link reference-    // The following are only links if the link reference exists:-    //   ] by itself is a shortcut link-    //   ][...] is an out-of-line link-    // Because we never emit link references, we don't need to handle these.-    return After.starts_with(":") || After.starts_with("(");-  case '=': // Setex heading.-    return RulerLength() > 0;-  case '_': // Horizontal ruler or matched delimiter.-    if (RulerLength() >= 3)-      return true;-    // Not a delimiter if surrounded by space, or inside a word.-    // (The rules at word boundaries are subtle).-    return !(SpaceSurrounds() || WordSurrounds());-  case '-': // Setex heading, horizontal ruler, or bullet.-    if (RulerLength() > 0)-      return true;-    return IsBullet();-  case '+': // Bullet list.-    return IsBullet();-  case '*': // Bullet list, horizontal ruler, or delimiter.-    return IsBullet() || RulerLength() >= 3 || !SpaceSurrounds();   case '<': // HTML tag (or autolink, which we choose not to escape)     return looksLikeTag(After);-  case '>': // Quote marker. Needs escaping at start of line.-    return StartsLine && Before.empty();   case '&': { // HTML entity reference     auto End = After.find(';');     if (End == llvm::StringRef::npos)@@ -142,10 +89,6 @@ bool needsLeadingEscape(char C, llvm::StringRef Before, llvm::StringRef After,     }     return llvm::all_of(Content, llvm::isAlpha);   }-  case '.': // Numbered list indicator. Escape 12. -> 12\. at start of line.-  case ')':-    return StartsLine && !Before.empty() &&-           llvm::all_of(Before, llvm::isDigit) && After.starts_with(" ");   default:     return false;   }@@ -156,8 +99,7 @@ bool needsLeadingEscape(char C, llvm::StringRef Before, llvm::StringRef After, std::string renderText(llvm::StringRef Input, bool StartsLine) {   std::string R;   for (unsigned I = 0; I < Input.size(); ++I) {-    if (needsLeadingEscape(Input[I], Input.substr(0, I), Input.substr(I + 1),-                           StartsLine))+    if (needsLeadingEscape(Input[I], Input.substr(I + 1)))       R.push_back('\\');     R.push_back(Input[I]);   }@@ -303,11 +245,12 @@ class CodeBlock : public Block { std::string indentLines(llvm::StringRef Input) {   assert(!Input.ends_with("\n") && "Input should've been trimmed.");   std::string IndentedR;-  // We'll add 2 spaces after each new line.+  // We'll add 2 spaces after each new line which is not followed by another new line.   IndentedR.reserve(Input.size() + Input.count('\n') * 2);-  for (char C : Input) {+  for (size_t I = 0; I < Input.size(); ++I) {+    char C = Input[I];     IndentedR += C;-    if (C == '\n')+    if (C == '\n' && (((I + 1) < Input.size()) && (Input[I + 1] != '\n')))       IndentedR.append("  ");   }   return IndentedR;@@ -348,20 +291,24 @@ void Paragraph::renderMarkdown(llvm::raw_ostream &OS) const {     if (C.SpaceBefore || NeedsSpace)       OS << " ";     switch (C.Kind) {-    case Chunk::PlainText:+    case ChunkKind::PlainText:       OS << renderText(C.Contents, !HasChunks);       break;-    case Chunk::InlineCode:+    case ChunkKind::InlineCode:       OS << renderInlineBlock(C.Contents);       break;+    case ChunkKind::Bold:+      OS << "**" << renderText(C.Contents, !HasChunks) << "**";+      break;+    case ChunkKind::Emphasized:+      OS << "*" << renderText(C.Contents, !HasChunks) << "*";+      break;     }     HasChunks = true;     NeedsSpace = C.SpaceAfter;   }-  // Paragraphs are translated into markdown lines, not markdown paragraphs.-  // Therefore it only has a single linebreak afterwards.-  // VSCode requires two spaces at the end of line to start a new one.-  OS << "  \n";+  // A paragraph in markdown is separated by a blank line.+  OS << "\n\n"; }  std::unique_ptr<Block> Paragraph::clone() const {@@ -370,8 +317,8 @@ std::unique_ptr<Block> Paragraph::clone() const {  /// Choose a marker to delimit `Text` from a prioritized list of options. /// This is more readable than escaping for plain-text.-llvm::StringRef chooseMarker(llvm::ArrayRef<llvm::StringRef> Options,-                             llvm::StringRef Text) {+llvm::StringRef Paragraph::chooseMarker(llvm::ArrayRef<llvm::StringRef> Options,+                                        llvm::StringRef Text) const {   // Prefer a delimiter whose characters don't appear in the text.   for (llvm::StringRef S : Options)     if (Text.find_first_of(S) == llvm::StringRef::npos)@@ -379,18 +326,94 @@ llvm::StringRef chooseMarker(llvm::ArrayRef<llvm::StringRef> Options,   return Options.front(); }+bool Paragraph::punctuationIndicatesLineBreak(llvm::StringRef Line) const{+  constexpr llvm::StringLiteral Punctuation = R"txt(.:,;!?)txt";++  Line = Line.rtrim();+  return !Line.empty() && Punctuation.contains(Line.back());+}++bool Paragraph::isHardLineBreakIndicator(llvm::StringRef Rest) const {+  // '-'/'*' md list, '@'/'\' documentation command, '>' md blockquote,+  // '#' headings, '`' code blocks, two spaces (markdown force newline)+  constexpr llvm::StringLiteral LinebreakIndicators = R"txt(-*@\>#`)txt";++  Rest = Rest.ltrim(" \t");+  if (Rest.empty())+    return false;++  if (LinebreakIndicators.contains(Rest.front()))+    return true;++  if (llvm::isDigit(Rest.front())) {+    llvm::StringRef AfterDigit = Rest.drop_while(llvm::isDigit);+    if (AfterDigit.starts_with(".") || AfterDigit.starts_with(")"))+      return true;+  }+  return false;+}++bool Paragraph::isHardLineBreakAfter(llvm::StringRef Line,+                                     llvm::StringRef Rest) const {+  // In Markdown, 2 spaces before a line break forces a line break.+  // Add a line break for plaintext in this case too.+  // Should we also consider whether Line is short?+  return Line.ends_with("  ") || punctuationIndicatesLineBreak(Line) ||+         isHardLineBreakIndicator(Rest);+}+ void Paragraph::renderPlainText(llvm::raw_ostream &OS) const {   bool NeedsSpace = false;+  std::string ConcatenatedText;+  llvm::raw_string_ostream ConcatenatedOS(ConcatenatedText);+   for (auto &C : Chunks) {++    if (C.Kind == ChunkKind::PlainText) {+      if (C.SpaceBefore || NeedsSpace)+        ConcatenatedOS << ' ';++      ConcatenatedOS << C.Contents;+      NeedsSpace = llvm::isSpace(C.Contents.back()) || C.SpaceAfter;+      continue;+    }+     if (C.SpaceBefore || NeedsSpace)-      OS << " ";+      ConcatenatedOS << ' ';     llvm::StringRef Marker = "";-    if (C.Preserve && C.Kind == Chunk::InlineCode)+    if (C.Preserve && C.Kind == ChunkKind::InlineCode)       Marker = chooseMarker({"`", "'", "\""}, C.Contents);-    OS << Marker << C.Contents << Marker;+    else if (C.Kind == ChunkKind::Bold)+      Marker = "**";+    else if (C.Kind == ChunkKind::Emphasized)+      Marker = "*";+    ConcatenatedOS << Marker << C.Contents << Marker;     NeedsSpace = C.SpaceAfter;   }-  OS << '\n';++  // We go through the contents line by line to handle the newlines+  // and required spacing correctly.+  llvm::StringRef Line, Rest;++  for (std::tie(Line, Rest) =+           llvm::StringRef(ConcatenatedText).trim().split('\n');+       !(Line.empty() && Rest.empty());+       std::tie(Line, Rest) = Rest.split('\n')) {++    Line = Line.ltrim();+    if (Line.empty())+      continue;++    OS << canonicalizeSpaces(Line);++    if (isHardLineBreakAfter(Line, Rest))+      OS << '\n';+    else if (!Rest.empty())+      OS << ' ';+  }++  // Paragraphs are separated by a blank line.+  OS << "\n\n"; }  BulletList::BulletList() = default;@@ -398,12 +421,13 @@ BulletList::~BulletList() = default;  void BulletList::renderMarkdown(llvm::raw_ostream &OS) const {   for (auto &D : Items) {+    std::string M = D.asMarkdown();     // Instead of doing this we might prefer passing Indent to children to get     // rid of the copies, if it turns out to be a bottleneck.-    OS << "- " << indentLines(D.asMarkdown()) << '\n';+    OS << "- " << indentLines(M) << '\n';   }   // We need a new line after list to terminate it in markdown.-  OS << '\n';+  OS << "\n\n"; }  void BulletList::renderPlainText(llvm::raw_ostream &OS) const {@@ -412,6 +436,7 @@ void BulletList::renderPlainText(llvm::raw_ostream &OS) const {     // rid of the copies, if it turns out to be a bottleneck.     OS << "- " << indentLines(D.asPlainText()) << '\n';   }+  OS << '\n'; }  Paragraph &Paragraph::appendSpace() {@@ -420,29 +445,44 @@ Paragraph &Paragraph::appendSpace() {   return *this; }-Paragraph &Paragraph::appendText(llvm::StringRef Text) {-  std::string Norm = canonicalizeSpaces(Text);-  if (Norm.empty())+Paragraph &Paragraph::appendChunk(llvm::StringRef Contents, ChunkKind K) {+  if (Contents.empty())     return *this;   Chunks.emplace_back();   Chunk &C = Chunks.back();-  C.Contents = std::move(Norm);-  C.Kind = Chunk::PlainText;-  C.SpaceBefore = llvm::isSpace(Text.front());-  C.SpaceAfter = llvm::isSpace(Text.back());+  C.Contents = std::move(Contents);+  C.Kind = K;   return *this; }+Paragraph &Paragraph::appendText(llvm::StringRef Text) {+  if (!Chunks.empty() && Chunks.back().Kind == ChunkKind::PlainText) {+    Chunks.back().Contents += std::move(Text);+    return *this;+  }++  return appendChunk(Text, ChunkKind::PlainText);+}++Paragraph &Paragraph::appendEmphasizedText(llvm::StringRef Text) {+  return appendChunk(canonicalizeSpaces(std::move(Text)),+                     ChunkKind::Emphasized);+}++Paragraph &Paragraph::appendBoldText(llvm::StringRef Text) {+  return appendChunk(canonicalizeSpaces(std::move(Text)), ChunkKind::Bold);+}+ Paragraph &Paragraph::appendCode(llvm::StringRef Code, bool Preserve) {   bool AdjacentCode =-      !Chunks.empty() && Chunks.back().Kind == Chunk::InlineCode;+      !Chunks.empty() && Chunks.back().Kind == ChunkKind::InlineCode;   std::string Norm = canonicalizeSpaces(std::move(Code));   if (Norm.empty())     return *this;   Chunks.emplace_back();   Chunk &C = Chunks.back();   C.Contents = std::move(Norm);-  C.Kind = Chunk::InlineCode;+  C.Kind = ChunkKind::InlineCode;   C.Preserve = Preserve;   // Disallow adjacent code spans without spaces, markdown can't render them.   C.SpaceBefore = AdjacentCode;@@ -475,7 +515,9 @@ Paragraph &Document::addParagraph() {   return *static_cast<Paragraph *>(Children.back().get()); }-void Document::addRuler() { Children.push_back(std::make_unique<Ruler>()); }+void Document::addRuler() {+  Children.push_back(std::make_unique<Ruler>());+}  void Document::addCodeBlock(std::string Code, std::string Language) {   Children.emplace_back(diff --git a/clang-tools-extra/clangd/support/Markup.h b/clang-tools-extra/clangd/support/Markup.hindex 3a869c49a2cbb..a74fade13d115 100644--- a/clang-tools-extra/clangd/support/Markup.h+++ b/clang-tools-extra/clangd/support/Markup.h@@ -49,6 +49,12 @@ class Paragraph : public Block {   /// Append plain text to the end of the string.   Paragraph &appendText(llvm::StringRef Text);+  /// Append emphasized text, this translates to the * block in markdown.+  Paragraph &appendEmphasizedText(llvm::StringRef Text);++  /// Append bold text, this translates to the ** block in markdown.+  Paragraph &appendBoldText(llvm::StringRef Text);+   /// Append inline code, this translates to the ` block in markdown.   /// \p Preserve indicates the code span must be apparent even in plaintext.   Paragraph &appendCode(llvm::StringRef Code, bool Preserve = false);@@ -58,11 +64,9 @@ class Paragraph : public Block {   Paragraph &appendSpace();  private:+  typedef enum { PlainText, InlineCode, Bold, Emphasized } ChunkKind;   struct Chunk {-    enum {-      PlainText,-      InlineCode,-    } Kind = PlainText;+    ChunkKind Kind = PlainText;     // Preserve chunk markers in plaintext.     bool Preserve = false;     std::string Contents;@@ -73,6 +77,19 @@ class Paragraph : public Block {     bool SpaceAfter = false;   };   std::vector<Chunk> Chunks;++  Paragraph &appendChunk(llvm::StringRef Contents, ChunkKind K);++  llvm::StringRef chooseMarker(llvm::ArrayRef<llvm::StringRef> Options,+                               llvm::StringRef Text) const;+  bool punctuationIndicatesLineBreak(llvm::StringRef Line) const;+  bool isHardLineBreakIndicator(llvm::StringRef Rest) const;+  bool isHardLineBreakAfter(llvm::StringRef Line, llvm::StringRef Rest) const;+};++class ListItemParagraph : public Paragraph {+public:+  void renderMarkdown(llvm::raw_ostream &OS) const override; };  /// Represents a sequence of one or more documents. Knows how to print them in a@@ -82,6 +99,9 @@ class BulletList : public Block {   BulletList();...[truncated]

tcottin mentioned this pull request

May 19, 2025

[clangd][WIP] Add doxygen parsing using standalone doxygen parser#128591

Closed

Copy link

Author

tcottin commentedMay 30, 2025

ping

Note: the next patches forclangd/clangd#529 are in my fork

HighCommander4 requested a review fromkadircet

May 30, 2025 18:14

Copy link

github-actionsbot commentedMay 30, 2025•
edited
Loading

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link

Collaborator

HighCommander4 commentedMay 30, 2025

Carrying over review request to@kadircet from the original combined patch at#128591 (comment):

@kadircet, you were involved with earlier rounds of review on the old patch (https://reviews.llvm.org/D143112) and design discussions in the issue (clangd/clangd#529) -- do you by chance have the cycles to pick up this review?

[clangd] fix formatting

1fe2007

emaxx-google reviewed

Jun 2, 2025

View reviewed changes

clang-tools-extra/clangd/support/Markup.h

		@@ -118,8 +138,8 @@ class Document {
		BulletList &addBulletList();

		/// Doesn't contain any trailing newlines.
		/// We try to make the markdown human-readable, e.g. avoid extra escaping.
		/// At least one client (coc.nvim) displays the markdown verbatim!

Copy link

Contributor

emaxx-googleJun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This deleted comment refers tococ.nvims rendering of markdown - would this PR cause a change of behavior for that use case? If I understand correctly, the prior work intentionally targeted clients, like this, with limited markdown rendering support.

If that's the case and we somehow need to find a balance between Markdown capabilities of different clients, could theMarkdownClientCapabilities flags be used to guard the necessary logic?

Copy link

Author

tcottinJun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This patch changes 2 things for markdown:

The documentation string is taken as is without escaping anything (except for html tags and entity references)
Paragraphs are separated with an empty line

My idea was to let the client handle the markdown rendering completely.
Therefore no escaping is needed and even false because if markdown annotations are escaped, the client will not render it as expected.
The same for the empty line between paragraphs: this is how paragraphs are separated in markdown.

I think it is save to assume that the client can handle unescaped markdown.
According to theLSP specification,
if a clients capability is specifying'markdown' asMarkupKind for itMarkupContent, the "content should follow the GitHub Flavored Markdown Specification".

Now regardingcoc.nvim: I dont use it and do not know the current behaviour.
Reading through e.g.clangd/coc-clangd#48 though, I understood thatcoc.nvim requestsMarkupContent with'markdown' but does not really render markdown.
Hence I would assume that in this case leaving out the escaping would result in just not showing the escaping character.
Regarding the empty newlines between paragraphs I would also expect that the empty line is just shown, creating better visual separations between paragraphs

But:@emaxx-google do you usecoc.nvim and could you give this patch a try?
Maybe my assumptions are false and it causes issues I did not see yet.

Regarding the capabilities: I think currently there is only'plaintext' and'markdown'.
Additionally there is the option to allow HTML tags when markdown is selected.
Other than that I dont know how we could handle this.
But I think we don't need to actually, because as stated above, if the client specifies to support markdown, it should handle it correctly.
If it can't handle it,'plaintext' is always an option.

Copy link

Contributor

emaxx-googleJun 3, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If I understand correctly, this PR assumes that the input documentation string is written using Markdown. The current implementation in Clangd, however, is much more conservative in its assumptions - it only recognizes basic syntax features like backticks. Hence when rendering formarkdown, everything else is escaped because the assumption is it's actually a plain text, so that all characters that'd (accidentally?) trigger special semantics in the Markdown client-side renderer are mangled to be not treated as such:

// Tests whether C should be backslash-escaped in markdown.<...>// It's always safe to escape punctuation, but want minimal escaping.// The strategy is to escape the first character of anything that might start// a markdown grammar construct.

A change to pass documentation strings as-is would cause regressions on code bases where comments aren't written with the markdown syntax in mind.

P.S. As for coc-nvim - I personally don't use it, but if we're to do radical changes to how things are output, it's worth it checking we don't regress the UI for this known case.

Copy link

Author

tcottinJun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the reply. I did not think about it this way.
The issue that you dont want your documentation to be rendered as markdown by the client seems different from the coc.nvim issue.

The old behaviour was to treat all documentation as non-markdown independent of client render capabilities.
Hence to even make this work for clients requesting markdown as the MarkupKind, markdown syntax in the documentation string was escaped.
This way, only information which was added by clangd (like the declaration of the hover point) is rendered in markdown, if the client supports it.

With this change, the documentation is taken as-is which leads to rendered markdown on clients requesting markdown.

If viewed this way, the escaping behaviour is actually not a client decision and independent of the clients capabilities.
Therefore we could introduce a server option to control the escaping behaviour I think.

Just for reference:doxygen supports markdown since 1.8.0. There is aMARKDOWN_SUPPORT configuration option which is set toYES by default.
This means in a doxygen comment, all markdown syntax will be rendered as markdown in the final documentation by default.
Hence the default doxygen behaviour is the same as the behaviour proposed with this change.

I would argue that for the markdown rendering behaviour of clangd it should be the same:
the proposed behaviour should be the default but could be switched off with a server option.

Copy link

Contributor

emaxx-googleJun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think we should distinguish between the input documentation's syntax and the client's rendering capabilities. The latter defines the syntax in which it expects the data received from the server, and not that all input is forced to be treated as if written in Markdown. As said, we're suspecting that source comments in a lot of projects would be rendered incorrectly if they'd be all interpreted as verbatim Markdown as a blanket decision.

So guarding this "no escaping" behavior by an option is one possibility, yes. Another possible direction could be to decide based on the syntax used in the comments - i.e., recognizing those distinctive comment syntaxes fromhttps://www.doxygen.nl/manual/docblocks.html ? (This latter option would be nice as it'd provide reasonable behavior by default, although if necessary it could still be combined with a server option.)

Copy link

Contributor

emaxx-googleJun 24, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sounds good generally! Thanks.

Just a quick question:

Plaintext (default): the old escaping behaviour and no doxygen command parsing. The only thing I would still change is the paragraph termination with 2 newlines instead of 1.

A single blank line between paragraphs sounds OK. Or do you mean adding another blank line in between, or a trailing blank line at the end? - that might look extraneous. In any case, that sounds like a relatively minor implementation detail that we can discuss further as the PR goes.

Copy link

Author

tcottinJun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A single blank line between paragraphs sounds OK. Or do you mean adding another blank line in between, or a trailing blank line at the end? - that might look extraneous. In any case, that sounds like a relatively minor implementation detail that we can discuss further as the PR goes.

I mean in between.

Multiple newlines in a comment used to be "collapsed" to a single newline, resulting in removed paragraph separation in Markdown:

/** * paragraph1 line1 * paragraph1 line2 * * paragraph2 line1 */

Hover looked like this:

paragraph1 line1 paragraph1 line2paragraph2 line1

With the change, paragraphs are now seperated by an empty line (as per Markdown specification).

paragraph1 line1 paragraph1 line2paragraph2 line1

Trailing newlines are always trimmed. This did not change.

Copy link

Contributor

emaxx-googleJul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Preserving the blank line between paragraphs sounds reasonable. Thanks for giving the examples.

Copy link

Author

tcottinJul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am not sure about the provided design.
I actually wanted to use the new config from themarkup::Document class directly.
But the class is in theclangdSupport lib which does not have access to the config apparently.

Therefore I added a newasEscapedMarkdown function to the class.
This allows the users to determine what is needed based on the config and the client capabilities.

I fixed the missed formatting, but force pushed...
I think this canceled the approval of the CI again.

Copy link

Author

tcottinJul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I realized that there were some things missing.

Whitespace is preserved now, therefore also the escaping implementation needed some additional changes.

Note: the whitespace preservation should not affect the client rendering.
Whitespace used to be canonicalized which leads to the markdown string already having spacing as the markdown renderer would show it on the client anyway.

Now whitespace is preserved but the client requests Markdown, hence it should be shown correctly.

Copy link

Author

tcottin commentedJun 7, 2025

ping

Copy link

mccakit commentedJun 20, 2025

ping@kadircet

Copy link

aaronliu0130 commentedJun 21, 2025

@mccakit It looks like there's currently a reply from@tcottin to emaxx-google's last comment needed.

Copy link

mccakit commentedJun 21, 2025

Yeah, I saw it. Hope they look into it. This PR is like 2 months old, doxygen parsing at clang have some patches I found online that date back to 2019.

I'm currently using tcottins PR branch right now, because that is the only way of getting doxygen comments on my editor, which is Zed.

liftchampion mentioned this pull request

Jun 27, 2025

Doxygen parsing missingclangd/clangd#529

Open

[clangd] introduce CommentFormat option

39b891d

tcottin force-pushed theimprove-markup-rendering branch frombf5a925 to39b891dCompare

July 1, 2025 17:32

[clangd] Escaping adaption due to whitespace preservation

dd5659e

Copy link

Author

tcottin commentedJul 10, 2025

ping@emaxx-google

[clangd] Set test as unsupported for Windows

Ditto in other places before that applystd::move() toStringRef.

clang-tools-extra/clangd/support/Markup.h Outdated

		@@ -58,11 +67,9 @@ class Paragraph : public Block {
		Paragraph &appendSpace();

		private:
		typedef enum { PlainText, InlineCode, Bold, Emphasized } ChunkKind;

Copy link

Contributor

emaxx-googleJul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: Just doenum ChunkKind { ... }.

clang-tools-extra/clangd/support/Markup.h Outdated

		bool isHardLineBreakAfter(llvm::StringRef Line, llvm::StringRef Rest) const;
		};

		class ListItemParagraph : public Paragraph {

Copy link

Contributor

emaxx-googleJul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: Unused?

clang-tools-extra/clangd/unittests/support/MarkupTests.cpp

		- foo
		- baz
		- foo

Copy link

Contributor

emaxx-googleJul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Related to the comment in the code - these additional blank lines within lists feel somewhat excessive. Is there a reason we have to add them?

Copy link

Author

tcottinJul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes.

We have to add them because we now end a bullet list with 2 new lines to have proper separation of bullet lists and normal paragraphs.
And I could not find a solution to do it differently.
We would need to know before we finish a bullet list if another bullet list follows.
And if another follows only add a single new line.
At least thats what I think we would need to solve.

Maybe you have another suggestion/idea?

Besides that, I dont think it is a big issue:

It is only an issue if nested bullet lists are defined by a user outside of the documentation parsing.
Currently, I did not see nested bullet lists used outside of this test.
For the documentation markup, nested bullet lists are also not used.
And for the future doxygen parsing,nested bullet lists are also not supported.
This does not affect lists which are written as markdown in the documentation.

tcottin requested a review fromemaxx-google

July 12, 2025 20:33

Copy link

Author

tcottin commentedJul 12, 2025

Thanks for the review!

I pushed some fixes.

[clangd] Fix review findings

4ec872d

tcottin force-pushed theimprove-markup-rendering branch from2d5faec to4ec872dCompare

July 12, 2025 20:54

AlvaroParker mentioned this pull request

Jul 16, 2025

Support for docs comments in C++zed-industries/zed#18169

Open

1 task

Labels

clang-tools-extra clangd

6 participants

Movatterモバイル変換

[clangd] Improve Markup Rendering#140498

Are you sure you want to change the base?

[clangd] Improve Markup Rendering#140498

Conversation

tcottin commentedMay 19, 2025

Uh oh!

github-actionsbot commentedMay 19, 2025

Uh oh!

llvmbot commentedMay 19, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

tcottin commentedMay 30, 2025

Uh oh!

github-actionsbot commentedMay 30, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

HighCommander4 commentedMay 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emaxx-googleJun 3, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emaxx-googleJun 24, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tcottin commentedJun 7, 2025

Uh oh!

mccakit commentedJun 20, 2025

Uh oh!

aaronliu0130 commentedJun 21, 2025

Uh oh!

mccakit commentedJun 21, 2025

Uh oh!

tcottin commentedJul 10, 2025

Uh oh!

emaxx-google left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

llvmbot commentedMay 19, 2025•
edited
Loading

github-actionsbot commentedMay 30, 2025•
edited
Loading

emaxx-googleJun 3, 2025•
edited
Loading

emaxx-googleJun 24, 2025•
edited
Loading