[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Need help to fix HTTP Header Injection vulnerability

On 10.04.2019 7:30, Karthikeyan wrote:
> Thanks Gregory. I think it's a good tradeoff to ensure this validation only for URLs of http scheme.
> I also agree handling newline is little problematic over the years and the discussion over the level at which validation should occur also 
> prolongs some of the patches. https://bugs.python.org/issue35906 is another similar case where splitlines is used but it's better to raise 
> an error and the proposed fix could be used there too. Victor seemed to wrote a similar PR like linked one for other urllib functions only 
> to fix similar attack in ftplib to reject newlines that was eventually fixed only in ftplib
> * https://bugs.python.org/issue30713
> * https://bugs.python.org/issue29606
> Search also brings multiple issues with one duplicate over another that makes these attacks scattered over the tracker and some edge case 
> missing. Slightly off topic, the last time I reported a cookie related issue where the policy can be overriden by third party library I 
> was asked to fix it in stdlib itself since adding fixes to libraries causes maintenance burden to downstream libraries to keep up 
> upstream. With urllib being a heavily used module across ecosystem it's good to have a fix landing in stdlib that secures downstream 
> libraries encouraging users to upgrade Python too.
Validation should occur whenever user data crosses a trust boundary -- i.e. when the library starts to assume that an extracted chunk now 
contains something valid.

https://tools.ietf.org/html/rfc3986 defines valid syntax (incl. valid characters) for every part of a URL -- _of any scheme_ (FYI, \r\n are 
invalid everywhere and the test code for ??? `data:' that Karthikeyan referred to is raw data to compare to rather than a part of a URL). It 
also obsoletes all the RFCs that the current code is written against.

AFAICS, urllib.split* fns (obsoleted as public in 3.8) are used by both urllib and urllib2 to parse URLs. They can be made to each validate 
the chunk that they split off. urlparse can validate the entire URL altogether.

Also, all modules ought to use the same code (urlparse looks like the best candidate) to parse URLs -- this will minimize the attack surface.

I think I can look into this later this week.

Fixing this is going to break code that relies on the current code accepting invalid URLs. But the docs have never said that e.g. in 
urlopen, anything apart from a (valid) URL is accepted (in particular, this implies that the user is responsible for escaping stuff properly 
before passing it). So I would say that we are within our right here and whoever is relying on those quirks is and has always been on 
unsupported territory.
Determining which of those quirks are exploitable and which are not to fix just the former is an incomparably larger, more error-prone and 
avoidable work. If anything, the history of the issue referenced to by previous posters clearly shows that this is too much to ask from the 
Python team.

I also see other undocumented behavior like accepting '<URL:<url>>' (also obsoleted as public in 3.8) which I would like to but it's of no harm.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20190410/69b896de/attachment.html>