Module documentation
Implements Lua functions mw.text.decode, mw.text.encode in a module.
{{#invoke:decodeEncode|decode|s=Source text©}}
→Source text©
See List of XML and HTML character entity references.
Decode (© → ©)
- Decodes Named Entities from entity name into a regular (unicode) character:
©
→©
>
→>
All well-defined named entities are decoded (HTML Named character references, formally: as defined in the PHP table).
- A regular, rendered sentence:
- "At 100 °F, & with a "burning" sun above, we , we ⁄walked⁄."
- In code:
- "
At 100 °F, & with a "burning" sun above, we ⁄walked⁄.
" -- wikitext
- "
- Processing:
{{#invoke:decodeEncode|decode|s=At 100 °F, & with a "burning" sun above, we ⁄walked⁄.}}
→At 100 °F, & with a "burning" sun above, we ⁄walked⁄.
-- In code: straight characters, no named entities.
- Renders, again:
- "At 100 °F, & with a "burning" sun above, we ⁄walked⁄."
Decode a reduced set only
By setting |subset_only=true
, only these five entity names are decoded: '<', '>', '&', '"', ' ' (that is, into '<', '>', '&', '"', ' ').
- Note: There is a difference with the relevant Lua parameter. (This only concerns your task if you also work directly with the Lua mw.text.decode function). Lua documentation defines parameter
|decodeNamedEntities=
, having this effect: when omitted or false, only the reduced set of entities is recognized and decoded. This use of 'false' is inverted in using|subset_only=
:|decodeNamedEntities=false
=|subset_only=true
.
- Also, this module ignores the "omitted" logic:
|subset_only=
should be set explicitly to 'true' to be effective.
Encode (© → ©)
- Function
encode
encodes some entity-named characters into that name (for example:&
→&
).
Regular sentence:
- "At >100 °F, & with a "burning" sun above, we walked. ©"
In code:
- "
At >100 °F, & with a "burning" sun above, we walked. ©
"
Encode:
{{#invoke:decodeEncode|encode|s=At >100 °F, & with a "burning" sun above, we walked. ©|charset=&<>{{!}}°"'&©}}
- →
At >100 °F, & with a "burning" sun above, we walked. ©
- Renders as:
- "At >100 °F, & with a "burning" sun above, we walked. ©"
character set to encode
Per Lua documentation, only a small set of characters is processed. The characterset can be set (expanded) by using |charset=
.
- Example:
|charset=<>" \'&
(the default),|charset=<>°"'&©{{!}}
; characters not in the default will be replaced by their decimal entity:©
→©
(hexadecimal number, not decimal nor named ©)
Known issues
- 13 Sep 2021: NOTE: The encode function with user-supplied charset is now used productively in {{R/superscript}} and {{R/ref}}. Before implementing breaking changes here, these templates need to be adjusted accordingly!
- 26 Sep 2021: U+2009 THIN SPACE ( ,  )
- Note: Possible bug: Decoding
 
works, but 
doesn't. - Resolved in code.
- 4 Feb 2023: U+03B5 ε GREEK SMALL LETTER EPSILON (ε, ε)
- See Module talk:DecodeEncode § Bug report: bad decoding of U+03B5 ε (epsilon)
- Resolved in code.
See also
require('strict')local p = {}local function _getBoolean( boolean_str )-- from: module:String; adapted-- requires an explicit truelocal boolean_valueif type( boolean_str ) == 'string' thenboolean_str = boolean_str:lower()if boolean_str == 'true' or boolean_str == 'yes' or boolean_str == '1' thenboolean_value = trueelseboolean_value = falseendelseif type( boolean_str ) == 'boolean' thenboolean_value = boolean_strelseboolean_value = falseendreturn boolean_valueendfunction p.decode( frame )local s = frame.args['s'] or ''local subset_only = _getBoolean(frame.args['subset_only'] or false)return p._decode( s, subset_only )endfunction p._decode( s, subset_only )-- U+2009 THIN SPACE: workaround for bug: HTML entity   is decoded incorrect. Entity   gets decoded properlys = mw.ustring.gsub( s, ' ', ' ' )-- U+03B5 ε GREEK SMALL LETTER EPSILON: workaround for bug (phab:T328840): HTML entity ε is decoded incorrect for gsub(). Entity ε gets decoded properlys = mw.ustring.gsub( s, 'ε', 'ε' )local ret = mw.text.decode( s, not subset_only )return retendfunction p.encode( frame )local s = frame.args['s'] or ''local charset = frame.args['charset']return p._encode( s, charset )endfunction p._encode( s, charset )-- example: charset = '_&©−°\\\"\'\=' -- do escape with backslash not %;local retif charset and charset ~= '' thenret = mw.text.encode( s, charset )else-- use default: chartset = '<>&"\' ' (outer quotes = lua required; space = NBSP)ret = mw.text.encode( s )end return retendreturn p
🔥 Top keywords: Main PageSpecial:SearchPage 3Wikipedia:Featured picturesHouse of the DragonUEFA Euro 2024Bryson DeChambeauJuneteenthInside Out 2Eid al-AdhaCleopatraDeaths in 2024Merrily We Roll Along (musical)Jonathan GroffJude Bellingham.xxx77th Tony AwardsBridgertonGary PlauchéKylian MbappéDaniel RadcliffeUEFA European Championship2024 ICC Men's T20 World CupUnit 731The Boys (TV series)Rory McIlroyN'Golo KantéUEFA Euro 2020YouTubeRomelu LukakuOpinion polling for the 2024 United Kingdom general electionThe Boys season 4Romania national football teamNicola CoughlanStereophonic (play)Gene WilderErin DarkeAntoine GriezmannProject 2025