osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Scraping multiple web pages help


Hello everyone,

For a research project, I need to scrape a lot of comments from
regulations.gov

https://www.regulations.gov/docketBrowser?rpp=25&so=DESC&sb=commentDueDate&po=0&dct=PS&D=ED-2018-OCR-0064

But partly what's throwing me is the url addresses of the comments. They
aren't consistent. I mean, there's some consistency insofar as the numbers
that differentiate the pages all begin after that 0064 number in the url
listed above. But the differnetiating numbers aren't even all the same
amount of numbers. Some are 4 (say, 4019) whereas others are 5 (say,
50343). But I dont think they go over 5. So this is a problem. I dont know
how to write the code to access the multiple pages.

I should also mention I'm new to programing, so that's also a problem (if
you cant already tell by the way I'm describing my problem).


I should also mention that, I think, there's an API on regulations.gov, but
I'm such a beginner that I dont evem really know where to find it, or even
what to do with it once I do. That's how helpless am right now.

Any help anyone could offer would be much appreciated.

D