NotADirectoryError: [Errno 20] Not a directory
Youssef Abdelmohsen wrote:
> Note: Beginner
> I'm trying to create an html parser that will go through a folder and all
> its subfolders and export all html files without any html tags, in file
> formats CSV and TXT with each html labeled with the title of the web page
> in a new CSV and TXT.
> However I keep getting an error saying:
> *"Traceback (most recent call last): File
> "/Users/username/Documents/htmlparser/parser10.py", line 59, in <module>
> for subentry in os.scandir(entry.path):NotADirectoryError: [Errno 20] Not
> a directory: '/Users/username/site/.DS_Store'"*
> Here's what I've done so far (I have bolded line 59):
The error message says it: in the outer loop you encounter a *file* called
".DS_Store" that doesn't match your regex. You then pass it to the inner
loop, i. e. entry.path below is a file
> for subentry in os.scandir(entry.path):
However os.scandir expects a *directory* rather than a file.
To fix the immediate problem you can ensure that entry is a directory
for subentry in os.scandir(entry.path):
but wait a moment. I note that you copied the code to process the html file
twice. This is bad practice as it's hard to keep the code in sync when you
apply changes (you already have a bug because you refer to the `entry`
variable of the outer loop in the inner loop, too).
Instead use a helper functions like
The loops then become
for entry in os.scandir(site_directory):
for subentry in os.scandir(entry.path)
for file in os.scandir(subentry.path)
Hm, that still looks messy; there may be bugs.
Do you really want to exclude the html files from the intermediate level?
I'd suggest that instead you scan the whole tree. Enter os.walk():
for path, folders, files in os.walk(site_directory):
for name in files:
filename = os.path.join(path, name):
While this doesn't do exactly the same thing it should be much clearer what
it does ;)