pandas loc on str lower for column comparison
Sayth Renshaw <flebber.crue at gmail.com> writes:
> I am importing 4 columns into a dataframe from a spreadsheet.
> My goal is to create a 5th column with TRUE or False if column 4 (str) matches column 3.
> Trying to leverage this answer https://stackoverflow.com/a/35940955/461887
> This is my code
> import pandas as pd
> xls = pd.ExcelFile("Melbourne.xlsx")
> df = xls.parse('Sheet1', skiprows= 4)
> df1 = df[['UID','Name','New Leader','Current Team', 'New Team']]
> df1['Difference'] = df1['Current
> Team'].str.lower().str.replace('s/+',"") == df1['New
> Which gives this error
> A value is trying to be set on a copy of a slice from a DataFrame.
I think what happens is that df1 is not a copy of the subset of df that you want, but it is a VIEW on df instead. This means that df1 shares memory with df (for memory savings reasons). But if you would change this view by adding a column, where should it be put? In df? If so, where?
So the correct way to do this is to make df1 a copy rather than a view.
df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')]
And than it should work.
Except that the str.replace is wrong for what you want. It just replaces the literal string "s/+" with an empty string instead of white space. This was wrong in the stackoverflow post.
To replace whitespace it should be str.replace('\\s+',"", regex=True). But simpler is to use str.strip():
df1['Difference'] = df1['Current Team'].str.lower().str.strip() == df1['New Team'].str.lower().str.strip()
Piet van Oostrum <piet-l at vanoostrum.org>
PGP key: [8DAE142BE17999C4]