Skip to content

Updated:(deps): Bump html-to-markdown-rs from 3.3.3 to 3.4.0 in /src#116

Merged
Sewer56 merged 1 commit into
mainfrom
dependabot/cargo/src/html-to-markdown-rs-3.4.0
May 12, 2026
Merged

Updated:(deps): Bump html-to-markdown-rs from 3.3.3 to 3.4.0 in /src#116
Sewer56 merged 1 commit into
mainfrom
dependabot/cargo/src/html-to-markdown-rs-3.4.0

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github May 11, 2026

Bumps html-to-markdown-rs from 3.3.3 to 3.4.0.

Release notes

Sourced from html-to-markdown-rs's releases.

v3.4.0

html-to-markdown 3.4.0 — high-performance HTML to Markdown converter with a Rust core and polyglot bindings (Python, Node/TypeScript, Ruby, PHP, Go, Java, C#, Elixir, R, WebAssembly, C FFI).

Install

Language Command
Rust cargo add html-to-markdown-rs
Python pip install html-to-markdown
Node / TS npm install @kreuzberg/html-to-markdown
WASM npm install @kreuzberg/html-to-markdown-wasm
Ruby gem install html-to-markdown
PHP pie install kreuzberg-dev/html-to-markdown-rs
Go go get github.com/kreuzberg-dev/html-to-markdown/packages/go/v3
Java Maven Central: dev.kreuzberg:html-to-markdown:3.4.0
C# dotnet add package KreuzbergDev.HtmlToMarkdown
Elixir {:html_to_markdown, "~> 3.4"} in mix.exs
R install.packages("htmltomarkdown")
Homebrew (CLI) brew install kreuzberg-dev/tap/html-to-markdown
Homebrew (lib) brew install kreuzberg-dev/tap/libhtml-to-markdown

Added

  • Homebrew distribution for html-to-markdown (CLI) and libhtml-to-markdown (FFI library + headers + pkg-config + CMake configs). Pre-built tarballs for macOS arm64/x86_64 and Linux arm64/x86_64; install with brew install kreuzberg-dev/tap/html-to-markdown.
  • WASM bundles for all four wasm-pack targets (web, bundler, nodejs, deno) under @kreuzberg/html-to-markdown-wasm.
  • C# NuGet package KreuzbergDev.HtmlToMarkdown with native runtimes for linux-x64, linux-arm64, osx-x64, osx-arm64, win-x64, win-arm64.
  • Java Maven Central package dev.kreuzberg:html-to-markdown bundling native libraries for the same six platforms via META-INF/native/<rid>/.
  • Elixir Hex package with rustler_precompiled NIFs for Linux + macOS (NIF 2.16/2.17 × 3 platforms); released artifacts download at first run.
  • PHP PIE pre-built archives for PHP 8.2/8.3/8.4/8.5 × 6 platforms — pie install kreuzberg-dev/html-to-markdown-rs no longer requires building from source.
  • CLI panic guard — conversion failures inside the CLI now surface as actionable errors via panic::catch_unwind instead of partial output + Rust backtrace.
  • HtmlVisitor parity across all bindings — Python, Node/TypeScript, Ruby, PHP, Go, Java, C#, Elixir, R, and WASM all expose the visitor interface with visit_element_start/visit_text/visit_element_end and VisitResult::{Continue, Skip, Custom} semantics matching the Rust core.
  • Polyglot codegen via alef — bindings, e2e tests, and READMEs for all 11 target languages are generated from a single alef.toml + Rust source of truth, eliminating drift across the polyglot surface.

Fixed

  • #348OutputFormat::Plain ignored HtmlVisitor callbacks. The plain-text walker (crates/html-to-markdown/src/converter/plain_text.rs) ran the markdown pipeline first, then discarded its output and re-traversed the DOM via a visitor-less walk_plain, so VisitResult::Custom/Skip returned from visit_element_end/visit_text was silently dropped for Plain. Threaded a WalkState carrying the visitor through the plain walker so element/text hooks fire and their results are honoured.
  • #347<img src> URLs not escaped, breaking CommonMark round-trip. crates/html-to-markdown/src/converter/handlers/image.rs emitted src raw, while <a href> already wrapped spaces/parens in angle brackets. Image renderer now uses the same three-branch escaping as links: empty → <>, contains space/newline → <URL>, unbalanced parens → \(/\) escaping.
  • #336 — large MS Word HTML truncated when <td><p class='MsoNormal'>…</td> appears as the leading cell. The tl parser absorbs subsequent <td> and document content into the unclosed <p>, nesting the rest of the DOM inside the first table cell. Extended has_inline_block_misnest in converter/preprocessing_helpers.rs with a has_p_ancestor check that detects td/tr/th under <p> (structurally impossible in valid HTML) and triggers the existing html5ever repair path.
  • Split closing tags </tagname\n> corrupted DOM and dropped content. JSX-style HTML (closing-tag > on the next line) caused the tl parser to leave elements unclosed, which silently absorbed siblings and dropped entire sections — affecting #127 (MW841 product headings missing from multilingual page), #143 (word-wrap merging nested link list items), and #121 (SPA menu nesting). New normalize_split_closing_tags preprocessing pass collapses such patterns to </tagname> before parsing, wired into all four preprocessing branches in converter/main.rs.
  • Tables now emit padded, aligned columns. Each cell is padded to the widest cell in its column; the separator row uses max(3, col_width) dashes per column. * and _ are escaped in table cells regardless of escape_misc. Fixes the gh-140 fixture parity and produces CommonMark-conformant tables out of the box.
  • #339 — bogus HTML comment endings dropped following content. The astral-tl parser silently discarded every byte after <!-- /// ---> or any --[-]+> comment terminator. New normalize_bogus_comment_endings preprocessing pass rewrites such sequences to --> before parsing; wired into the html5ever-repair and inline-block-misnest fallback paths too.
  • #340 — npm pre-release versions clobbered the latest dist-tag. Pre-release versions (matching -(rc|beta|alpha|pre|dev)) now publish under the next dist-tag, so npm install @kreuzberg/html-to-markdown-node no longer pulls a 3.4.0-rc over a stable 3.3.x.
  • #337from html_to_markdown import HeadingStyle raised TypeError. The package now re-exports the native PyO3 enums directly from _html_to_markdown and adds uppercase aliases (HeadingStyle.ATX, CodeBlockStyle.BACKTICKS) so both naming conventions satisfy ConversionOptions(heading_style=…).
  • #334 — Ruby HtmlToMarkdown.convert(html, options) raised TypeError on every call with options. The wrapper passed a ConversionOptions object to the FFI, but the generated Rust function expects Option<String> JSON. Wrapper now serialises the options hash to JSON before crossing the FFI boundary.
  • #332default-features = false Rust build broken. Bare #[serde(...)] and #[derive(Serialize, Deserialize)] on core types in src/types/{document,tables,result,warnings}.rs and src/options/conversion.rs are now feature-gated behind #[cfg_attr(feature = "serde", ...)]. CI now runs a cargo check --no-default-features matrix to prevent regressions.
  • #331 — visitor element_start/element_end events mispaired for hyphenated/namespaced custom tags. The repair_with_html5ever fallback re-parsed under HTML5 semantics, which discard XML-style self-closing on unknown elements. The repair path now pre-expands XML self-closing tags on non-void elements to explicit open+close pairs before the HTML5 parse.
  • PHP visitor marshaling — visitor callbacks now correctly marshal arguments and handle array return values; setVisitor() method added to ConversionOptions.
  • Elixir metadata serialization — metadata maps now serialize as JSON instead of Elixir debug format.
  • WASM Vitest environment — WASM module loading now correctly handles Node.js module format in Vitest test environments.

... (truncated)

Changelog

Sourced from html-to-markdown-rs's changelog.

[3.4.0] - 2026-05-09

Added

  • Homebrew distribution for html-to-markdown (CLI) and libhtml-to-markdown (FFI library + headers + pkg-config + CMake configs). Pre-built tarballs for macOS arm64/x86_64 and Linux arm64/x86_64; install with brew install kreuzberg-dev/tap/html-to-markdown.
  • WASM bundles for all four wasm-pack targets (web, bundler, nodejs, deno) under @kreuzberg/html-to-markdown-wasm.
  • C# NuGet package KreuzbergDev.HtmlToMarkdown with native runtimes for linux-x64, linux-arm64, osx-x64, osx-arm64, win-x64, win-arm64.
  • Java Maven Central package dev.kreuzberg:html-to-markdown bundling native libraries for the same six platforms via META-INF/native/<rid>/.
  • Elixir Hex package with rustler_precompiled NIFs for Linux + macOS (NIF 2.16/2.17 × 3 platforms); released artifacts download at first run.
  • PHP PIE pre-built archives for PHP 8.2/8.3/8.4/8.5 × 6 platforms — pie install kreuzberg-dev/html-to-markdown-rs no longer requires building from source.
  • CLI panic guard — conversion failures inside the CLI now surface as actionable errors via panic::catch_unwind instead of partial output + Rust backtrace.
  • HtmlVisitor parity across all bindings — Python, Node/TypeScript, Ruby, PHP, Go, Java, C#, Elixir, R, and WASM all expose the visitor interface with visit_element_start/visit_text/visit_element_end and VisitResult::{Continue, Skip, Custom} semantics matching the Rust core.
  • Polyglot codegen via alef — bindings, e2e tests, and READMEs for all 11 target languages are generated from a single alef.toml + Rust source of truth, eliminating drift across the polyglot surface.

Fixed

  • #348OutputFormat::Plain ignored HtmlVisitor callbacks. The plain-text walker (crates/html-to-markdown/src/converter/plain_text.rs) ran the markdown pipeline first, then discarded its output and re-traversed the DOM via a visitor-less walk_plain, so VisitResult::Custom/Skip returned from visit_element_end/visit_text was silently dropped for Plain. Threaded a WalkState carrying the visitor through the plain walker so element/text hooks fire and their results are honoured.
  • #347<img src> URLs not escaped, breaking CommonMark round-trip. crates/html-to-markdown/src/converter/handlers/image.rs emitted src raw, while <a href> already wrapped spaces/parens in angle brackets. Image renderer now uses the same three-branch escaping as links: empty → <>, contains space/newline → <URL>, unbalanced parens → \(/\) escaping.
  • #336 — large MS Word HTML truncated when <td><p class='MsoNormal'>…</td> appears as the leading cell. The tl parser absorbs subsequent <td> and document content into the unclosed <p>, nesting the rest of the DOM inside the first table cell. Extended has_inline_block_misnest in converter/preprocessing_helpers.rs with a has_p_ancestor check that detects td/tr/th under <p> (structurally impossible in valid HTML) and triggers the existing html5ever repair path.
  • Split closing tags </tagname\n> corrupted DOM and dropped content. JSX-style HTML (closing-tag > on the next line) caused the tl parser to leave elements unclosed, which silently absorbed siblings and dropped entire sections — affecting #127 (MW841 product headings missing from multilingual page), #143 (word-wrap merging nested link list items), and #121 (SPA menu nesting). New normalize_split_closing_tags preprocessing pass collapses such patterns to </tagname> before parsing, wired into all four preprocessing branches in converter/main.rs.
  • Tables now emit padded, aligned columns. Each cell is padded to the widest cell in its column; the separator row uses max(3, col_width) dashes per column. * and _ are escaped in table cells regardless of escape_misc. Fixes the gh-140 fixture parity and produces CommonMark-conformant tables out of the box.
  • #339 — bogus HTML comment endings dropped following content. The astral-tl parser silently discarded every byte after <!-- /// ---> or any --[-]+> comment terminator. New normalize_bogus_comment_endings preprocessing pass rewrites such sequences to --> before parsing; wired into the html5ever-repair and inline-block-misnest fallback paths too.
  • #340 — npm pre-release versions clobbered the latest dist-tag. Pre-release versions (matching -(rc|beta|alpha|pre|dev)) now publish under the next dist-tag, so npm install @kreuzberg/html-to-markdown-node no longer pulls a 3.4.0-rc over a stable 3.3.x.
  • #337from html_to_markdown import HeadingStyle raised TypeError. The package now re-exports the native PyO3 enums directly from _html_to_markdown and adds uppercase aliases (HeadingStyle.ATX, CodeBlockStyle.BACKTICKS) so both naming conventions satisfy ConversionOptions(heading_style=…).
  • #334 — Ruby HtmlToMarkdown.convert(html, options) raised TypeError on every call with options. The wrapper passed a ConversionOptions object to the FFI, but the generated Rust function expects Option<String> JSON. Wrapper now serialises the options hash to JSON before crossing the FFI boundary.
  • #332default-features = false Rust build broken. Bare #[serde(...)] and #[derive(Serialize, Deserialize)] on core types in src/types/{document,tables,result,warnings}.rs and src/options/conversion.rs are now feature-gated behind #[cfg_attr(feature = "serde", ...)]. CI now runs a cargo check --no-default-features matrix to prevent regressions.
  • #331 — visitor element_start/element_end events mispaired for hyphenated/namespaced custom tags. The repair_with_html5ever fallback re-parsed under HTML5 semantics, which discard XML-style self-closing on unknown elements. The repair path now pre-expands XML self-closing tags on non-void elements to explicit open+close pairs before the HTML5 parse.
  • PHP visitor marshaling — visitor callbacks now correctly marshal arguments and handle array return values; setVisitor() method added to ConversionOptions.
  • Elixir metadata serialization — metadata maps now serialize as JSON instead of Elixir debug format.
  • WASM Vitest environment — WASM module loading now correctly handles Node.js module format in Vitest test environments.
  • R e2e result wrappingresult_is_r_list configured to suppress jsonlite double-wrapping of conversion results.

Changed

  • pnpm v11 — migrated from pnpm v10 to v11; pnpm-workspace.yaml declares onlyBuiltDependencies: [esbuild] and ignoredBuiltDependencies: [wasm-pack] for the new opt-in build script policy.
  • Cross-language dependency bumpsorg.jetbrains:annotations 26.0.0 → 26.1.0, plus updates across all language toolchains via task upgrade.
Commits
  • 66b91f9 chore: remove stray debug tmp file from 3.4.0 commit
  • d709db6 chore(release): 3.4.0
  • e3e644d ci(publish): announce releases on Discord
  • a9dc8e8 chore(release): bump to 3.4.0-rc.45
  • 4571f4a chore(release): bump to 3.4.0-rc.44
  • 04cb69d refactor(publish): consume shared homebrew bottle actions
  • b35f88c chore(rumdl): exclude alef-generated package READMEs
  • 9909d6b chore(readme): apply rumdl-fmt to Go README
  • 04c6795 chore(readme): regenerate via alef readme
  • 140b5f1 fix(publish/python): use cibw-build=cp310-* to leverage abi3
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [html-to-markdown-rs](https://github.com/kreuzberg-dev/html-to-markdown) from 3.3.3 to 3.4.0.
- [Release notes](https://github.com/kreuzberg-dev/html-to-markdown/releases)
- [Changelog](https://github.com/kreuzberg-dev/html-to-markdown/blob/main/CHANGELOG.md)
- [Commits](kreuzberg-dev/html-to-markdown@v3.3.3...v3.4.0)

---
updated-dependencies:
- dependency-name: html-to-markdown-rs
  dependency-version: 3.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github May 11, 2026

Labels

The following labels could not be found: dependencies, rust. Please create them before Dependabot can add them to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

@Sewer56 Sewer56 merged commit a994829 into main May 12, 2026
19 checks passed
@Sewer56 Sewer56 deleted the dependabot/cargo/src/html-to-markdown-rs-3.4.0 branch May 12, 2026 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant