osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Error getting data from website


On 7/12/19 12:53 PM, Sam Paython wrote:
> This is the code I am writing:
> import requests
> from bs4 import BeautifulSoup
> request = requests.get("https://www.amazon.ca/dp/B07RZFQ6HC";)
> content = request.content
> soup = BeautifulSoup(content, "html.parser")
> element = soup.find("span",{"id":"priceblock_dealprice"})
> print(element.text.strip())
> 
> and this is the error I am getting:
> C:\Users\Sam\PycharmProjects\untitled2\venv\Scripts\python.exe C:/Users/Sam/PycharmProjects/untitled2/src/app.py
> Traceback (most recent call last):
>    File "C:/Users/Sam/PycharmProjects/untitled2/src/app.py", line 9, in <module>
>      print(element.text.strip())
> AttributeError: 'NoneType' object has no attribute 'text'
> 
> Could someone please help?


The err.msg/stack-trace is your friend! The comment about "NoneType" 
means 'there's nothing there' (roughly!) to print().

The question then becomes: "why?" or "why not?"...

With a short piece of code like this, and (I am assuming) trying-out a 
library for the first time, may I recommend that you use the Python 
REPL, because it allows you to 'see' what's going-on behind the 
scenes/underneath the hood - and ultimately, reveals the problem.

 From a Python terminal (cmd is appropriate to your PC's OpSys):

[dn at JrBrown ~]$ python3
Python 3.7.4 (default, Jul  9 2019, 16:48:28)
[GCC 8.3.1 20190223 (Red Hat 8.3.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
 >>> import requests
 >>> from bs4 import BeautifulSoup
 >>> request = requests.get("https://www.amazon.ca/dp/B07RZFQ6HC";)
 >>> request		# notice how I'm asking to 'see' what happened
<Response [503]>
 >>> content = request.content
 >>> content		# there is no need to enclose in print()!
b'<!DOCTYPE html>\n<!--[if lt IE 7]> <html lang="en-us" class="a-no-js 
...many lines of HTML, excised in the interests of brevity...
\')[0].appendChild(elem);\n    }\n    </script>\n</body></html>\n'
 >>> soup = BeautifulSoup(content, "html.parser")
 >>> soup
<!DOCTYPE html>
...many more lines of HTML...
</body></html>

 >>> element = soup.find("span",{"id":"priceblock_dealprice"})
 >>> element
 >>>

The last entry is asking for the contents of "element" to be displayed - 
and they are, excepting that element contains nothing/None. Oops!


Working 'backwards' (and using 'simple' Python functions to prove that 
it is not our use of requests/BS4 that is at-fault):

 >>> soup.find( "price" )		# not found

 >>> content.find( b"price" )		# the b"" is necessary because
					# we are dealing with bytes
					# not a Unicode string
-1
 >>> 					#

Sadly, the -1 indicates that "price" was not found. Which is bound to be 
disappointing to you.


Yet all is not lost!

If you read the HTML data that the REPL has happily splattered all over 
your terminal's screen (scroll back) (NB "soup" is easier to read than 
is "content"!) you will observe that what you saw in your web-browser is 
not what Amazon served in response to the Python "requests.get()"!
--
Regards =dn