Comparison of HTML parsers

(Learn how and when to remove this message)

HTML parsers are software for automated Hypertext Markup Language (HTML) parsing. They have two main purposes:

  • HTML traversal: offer an interface for programmers to easily access and modify the "HTML string code". Canonical example: DOM parsers.
  • HTML clean: to fix invalid HTML and to improve the layout and indent style of the resulting markup. Canonical example: HTML Tidy.
ParserLicenseImplementation language(s)Latest date*HTML parsing[1]HTML5-compliant parsingClean HTML**Update HTML***
HTML TidyW3C licenseANSI C2021-07-17[2]Yes[3]YesYes[3]Yes
HtmlUnitApache License 2.0Java2023-10-31[4]Yes?NoNo
Beautiful SoupMIT LicensePython2023-04-07[5]YesYes?No
jsoupMIT LicenseJava2024-07-10[6]YesYesYesYes
ParserLicenseImplementation language(s)Latest date*HTML ParsingHTML5-compliant ParsingClean HTML**Update HTML***
* Latest release (of significant changes) date.
** sanitize (generating standard-compatible web-page, reduce spam, etc.) and clean (strip out surplus presentational tags, remove XSS code, etc.) HTML code.
*** Updates HTML4.X to XHTML or to HTML5, converting deprecated tags (ex. CENTER) to valid ones (ex. DIV with style="text-align:center;").

References

Retrieved from "https:https://www.search.com.vn/wiki/index.php?lang=en&q=Comparison_of_HTML_parsers&oldid=1233665390"
🔥 Top keywords: Main PageShannen DohertySpecial:SearchCarlos AlcarazList of United States presidential assassination attempts and plotsAttempted assassination of Donald TrumpDonald TrumpRichard Simmons2024 shooting at a Donald Trump rallyLamine YamalNovak DjokovicNico WilliamsUEFA European ChampionshipWikipedia:Featured picturesThomas Matthew CrooksProject 2025Attempted assassination of Ronald ReaganUEFA Euro 2024Jacoby JonesAR-15–style rifleMukesh AmbaniLonglegsSpain national football teamKimberly CheatleKalki 2898 ADList of Wimbledon gentlemen's singles championsCole PalmerGareth SouthgateJohn Hinckley Jr.Harry KaneLuke PerryAntifa (United States)United States Secret Service.xxxDeaths in 2024Ruth WestheimerEvan VucciButler, PennsylvaniaIndian 2