[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Scraping multiple web pages help

Hi there Drake,

 A quick google search revealed:
 - https://regulationsgov.github.io/developers/

 This seems particulriy useful:
 - https://regulationsgov.github.io/developers/console/

 And to fetch stuff from the API, there's Python requests that has a rather
wonderful doc:
 - http://docs.python-requests.org/en/master/



On Mon, Feb 18, 2019 at 8:22 PM Drake Gossi <drake.gossi at gmail.com> wrote:

> Hello everyone,
> For a research project, I need to scrape a lot of comments from
> regulations.gov
> https://www.regulations.gov/docketBrowser?rpp=25&so=DESC&sb=commentDueDate&po=0&dct=PS&D=ED-2018-OCR-0064
> But partly what's throwing me is the url addresses of the comments. They
> aren't consistent. I mean, there's some consistency insofar as the numbers
> that differentiate the pages all begin after that 0064 number in the url
> listed above. But the differnetiating numbers aren't even all the same
> amount of numbers. Some are 4 (say, 4019) whereas others are 5 (say,
> 50343). But I dont think they go over 5. So this is a problem. I dont know
> how to write the code to access the multiple pages.
> I should also mention I'm new to programing, so that's also a problem (if
> you cant already tell by the way I'm describing my problem).
> I should also mention that, I think, there's an API on regulations.gov,
> but
> I'm such a beginner that I dont evem really know where to find it, or even
> what to do with it once I do. That's how helpless am right now.
> Any help anyone could offer would be much appreciated.
> D
> --
> https://mail.python.org/mailman/listinfo/python-list

Sivan Greenberg
Co founder & CTO
Vitakka Consulting