osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How to compare words from .txt file against words in .xlsx file via Python? I will then extract these words by writing it to a new .xls file


On Monday, 5 August 2019 07:21:52 UTC+8, MRAB  wrote:
> On 2019-08-05 00:10, A S wrote:
> > Oh... By set did you mean by using python function set(variable) as 
> > something?
> >
> > So sorry for bothering you..
> >
> Make it a set (outside the loop):
> 
>  ??? dictionary = set()
> 
> and then add the words to it (inside the loop):
> 
>  ??? dictionary.add(cell_range.value)
> 
> (Maybe also rename the variable to, say, "words_wanted", because calling 
> it "dictionary" when it's not a dictionary (dict) could be confusing...)
> 
> > On Mon, 5 Aug 2019, 6:52 am A S, <aishan0403 at gmail.com 
> > <mailto:aishan0403 at gmail.com>> wrote:
> >
> >     Previously I had tried many methods and using set was one of them
> >     but it didn't work out either.. I even tried to append it to a
> >     list but it's not working out..
> >
> >     On Mon, 5 Aug 2019, 2:29 am MRAB, <python at mrabarnett.plus.com
> >     <mailto:python at mrabarnett.plus.com>> wrote:
> >
> >         On 2019-08-04 18:53, A S wrote:
> >         > Hi Mrab,
> >         >
> >         > Thank you so much for your detailed response, I really really
> >         > appreciate it as I have been constantly trying to seek help
> >         regarding
> >         > this issue.
> >         >
> >         > Yes, I figured that the dictionary is only capturing the
> >         last value :(
> >         > I've been trying to get it to capture and store all the
> >         values to
> >         > memory in python but it's not working..
> >         >
> >         > Are there any improvements that I could make to allow my
> >         code to?work?
> >         >
> >         > I would be truly grateful if you could provide further
> >         insights on this..
> >         >
> >         > Thank you so much.
> >         >
> >         Make it a set and then add the words to it.
> >
> >         >
> >         > On Mon, 5 Aug 2019, 1:45 am MRAB,
> >         <python at mrabarnett.plus.com <mailto:python at mrabarnett.plus.com>
> >         > <mailto:python at mrabarnett.plus.com
> >         <mailto:python at mrabarnett.plus.com>>> wrote:
> >         >
> >         >? ? ?On 2019-08-04 09:29, aishan0403 at gmail.com
> >         <mailto:aishan0403 at gmail.com>
> >         >? ? ?<mailto:aishan0403 at gmail.com
> >         <mailto:aishan0403 at gmail.com>> wrote:
> >         >? ? ?> I want to compare the common words from multiple .txt
> >         files
> >         >? ? ?based on the words in multiple .xlsx files.
> >         >? ? ?>
> >         >? ? ?> Could anyone kindly help with my code? I have been
> >         stuck for
> >         >? ? ?weeks and really need help..
> >         >? ? ?>
> >         >? ? ?> Please refer to this link:
> >         >? ? ?>
> >         >
> >         https://stackoverflow.com/questions/57319707/how-to-compare-words-from-txt-file-against-words-in-xlsx-file-via-python-i-wi
> >         >? ? ?>
> >         >? ? ?> Any help is greatly appreciated really!!
> >         >? ? ?>
> >         >? ? ?First of all, in this line:
> >         >
> >         >? ? ?? ? ?folder_path1 =
> >         os.chdir("C:/Users/xxx/Documents/xxxx/Test
> >         >? ? ?python dict")
> >         >
> >         >? ? ?it changes the current working directory (not a
> >         problem), but 'chdir'
> >         >? ? ?returns None, so from that point 'folder_path1' has the
> >         value None.
> >         >
> >         >? ? ?Then in this line:
> >         >
> >         >? ? ?? ? ?for file in os.listdir(folder_path1):
> >         >
> >         >? ? ?it's actually doing:
> >         >
> >         >? ? ?? ? ?for file in os.listdir(None):
> >         >
> >         >? ? ?which happens to work because passing it None means to
> >         return the
> >         >? ? ?names
> >         >? ? ?in the current directory.
> >         >
> >         >? ? ?Now to your problem.
> >         >
> >         >? ? ?This line:
> >         >
> >         >? ? ?? ? ?dictionary = cell_range.value
> >         >
> >         >? ? ?sets 'dictionary' to the value in the spreadsheet cell,
> >         and you're
> >         >? ? ?doing
> >         >? ? ?it each time around the loop. At the end of the loop,
> >         'dictionary'
> >         >? ? ?will
> >         >? ? ?be set to the _last_ such value. You're not collecting
> >         the value, but
> >         >? ? ?merely remembering the last value.
> >         >
> >         >? ? ?Looking further on, there's this line:
> >         >
> >         >? ? ?? ? ?if txtwords in dictionary:
> >         >
> >         >? ? ?Remember, 'dictionary' is the last value (a string), so
> >         that'll be
> >         >? ? ?True
> >         >? ? ?only if 'txtwords' is a substring of the string in
> >         'dictionary'.
> >         >
> >         >? ? ?That's why you're seeing only one match.
> >         >
> >

My latest reply to Mrab in case anybody needs it (and p.s. I'm so sorry for spamming you Mrab):

Mrab! Thank you so much for your constant replies ! I'm able to print out the words now!! Using these codes:

import os, sys
import xlrd
from xlrd import open_workbook
import openpyxl
from openpyxl.reader.excel import load_workbook
import xlwt
from xlwt import Workbook

#The filepath that I will be saving my .xls file to:
filepath = ('C:/Users/Ai Shan/Documents/CPFB Work/LAN SAS MONTHLY.xls')

#The .xls file:
wb2 = xlrd.open_workbook('C:\\Users\\Ai Shan\\Documents\\CPFB Work\\LAN SAS MONTHLY.xls', on_demand= True)
  
wb2 = Workbook()
sheet2 = wb2.add_sheet("LAN SAS", cell_overwrite_ok=True)

#The .xlxs file that contains the words I want to compare with the .txt files:
folder_path1 = os.chdir("C:/Users/Ai Shan/Documents/CPFB Work/Test python dict")

words= set()
for file in os.listdir(folder_path1):
    if file.endswith(".xlsx"):
        wb = load_workbook(file, data_only=True)
        ws = wb.active
        words.add(str(ws['A1'].value))
        #cell_range = ws['A1']
        #with open('copy.txt','w+') as f:
           # f.write(str(cell_range.value))
        

# Me writing the name of each .txt file to the .xls file:
for r, dir in enumerate(os.listdir("C:/Users/Ai Shan/Documents/CPFB Work/txt test python")):
   sheet2.write(r+1,1,dir)

#Reading .txt file and trying to make the sentence into words instead of lines so that I can compare the .txt individual words with the .xlsx file:
path = os.chdir("C:/Users/Ai Shan/Documents/CPFB Work/txt test python")

for name in os.listdir(path):
    if name.endswith(".txt"):
        with open(name, 'r') as texts:
            s = texts.read()
            import re
            m = re.match(r'(?:.*?\n)(?P<word>\w+?)\b', s)
            if m:
                word = m.group('word')
                if word in words:
                    print(word)
                    sheet2.write(r+1,2,word)
                    wb2.save(filepath)  

But I'm not able to write the printed values to my excel workbook..its only printing "pear" again..

I want to get this outcome:

apples
orange
pear

But I'm only getting the last value again, am I writing the code wrongly..?