[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Error getting data from website

Michael Torrie wrote:

> On 12/6/19 5:31 PM, DL Neil via Python-list wrote:
>> If you read the HTML data that the REPL has happily splattered all over
>> your terminal's screen (scroll back) (NB "soup" is easier to read than
>> is "content"!) you will observe that what you saw in your web-browser is
>> not what Amazon served in response to the Python "requests.get()"!
> Sadly it's likely that Amazon's page is largely built from javascript.

That's not the problem here. Quoting the html returned by


To discuss automated access to Amazon data please contact api-services-
support at amazon.com.

If you retrieve the page manually:

$ wget "https://www.amazon.ca/dp/B07RZFQ6HC"; -O tmp.gz
2019-12-07 11:47:03 (80,6 KB/s) - ?tmp.gz? gespeichert [115426]

$ gunzip tmp.gz
$ python3
Type "help", "copyright", "credits" or "license" for more information.
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(open("tmp").read())
>>> soup.find("span", dict(id="priceblock_dealprice")
... )
<span class="a-size-medium a-color-price priceBlockDealPriceString" 
id="priceblock_dealprice">CDN$ 1,019.00</span>
>>> _.text

> So scraping static html is probably not going to get you where you want
> to go.  

... because Amazon doesn' like what you do. You can cheat or play by their 
rules and use the API.

> There are heavier tools, such as Selenium that uses a real
> browser to grab a page, and the result of that you can parse and search
> perhaps.