[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mod_proxy_html and special characters


> On 25 May 2018, at 17:43, Micha Lenk <micha@xxxxxxxxx> wrote:
> 
> 524              s_from = strlen(m->from.c);
> 525              if (!strncasecmp(ctx->buf, m->from.c, s_from)) {
> ...                  ... do the string replacement ...
> 
> 
> ... where ctx->buf is the URL found in the HTML document, and m->from.c is the first configured argument of ProxyHTMLURLMap. So, if the latter is a prefix of the first, this condition should be true and the string replacement should happen. When the expected string replacement doesn't happen, the condition is false and the values of the variables are:
> 
> ctx->buf  = http://internal/!%22%23$/
> m->from.c = http://internal/!"#$/
> 
> So, the strings don't match and are not replaced for that reason.

Yep.  mod_proxy_html takes what it sees.  That's why it relies on another module
(mod_xml2enc) for i18n, which is kind-of what I expected to see from your
subject line!

> Going forward I am not interested in finding a work around for this, but more how to approach a fix (if this is a bug at all).
> 
> Is it reasonable to expect mod_proxy_html to rewrite URL encoded URLs as well?

I think it's reasonable to use the escaped html in your ProxyHTMLURLMap.
If we have mod_proxy_html unescape characters, it adds complexity to the code,
and (perhaps more to the point) presents a mirror-image of your problem to
anyone with the opposite expectations.

> Let's assume this needs to be fixed. To make the strings match, we could either URL escape the value from the Apache directive ProxyHTMLURLMap, or URL temporarily URL-decode the string found in the HTML document just for the purpose of the string comparison. What is the right thing to do?

I prefer to leave it to server admins to find the match that works for them.
I don't recollect this particular question ever arising in 15 years, which kind-of
suggests users are not confused by it!

-- 
Nick Kew