Skip to main content
  1. Blog/

HTML2Markdown

app programming Swift Fedicat markdown Mastodon fediverse HTML
Phil Chu
Author
Phil Chu
Making software since the 80s

After the latest tweak to the not-really HTML parser in my fediverse app (it does use the SwiftSoup HTML parser but just to extract plain text and relies on a bunch of pre and post-processing hacks to look OK), I decided it was time to have a real HTML-to-markdown converter.

Fortunately, I ran across HTML2Markdown, already modified from the original to use SwiftSoup and add options tailored to converting HTML produced by Mastodon.

I forked it to add some customizations for my app, so far they include handling the h1-h6 header tags (not produced by Mastodon as far as I know but I encountered them in Firefish announcement text), and boldfacing hashtags and mentions while removing their href links (I display tappable mentions and tags separately underneath the post).

Here’s how I call the converter, from a String extension:

let dom = try HTMLParser().parse(html: self)
            return dom.markdownFormatted(options:
                                            [.escapeMarkdown,
                                             .swiftui,
                                             .mastodon,
                                             .boldTag,
                                             .boldMention,
                                             .unorderedListBullets])

The .escapeMarkdown option escapes any existing characters that may accidentally be rendered as markdown, .mastodon makes a few adjustments for Mastodon-generated HTML, and .unorderedListBullets makes nice lists.

I added .swiftui to generate only markdown that can be rendered by SwiftUI (this is applicable to the header tags), .boldTag and .boldMention to turn hashtags and mentions into boldface without links. That code might not be so robust, though, as it just checks links if they have @ and # prefixes (Mastodon includes hashtag and mention classes but can’t rely on that from other platforms).