osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

pandas split and melt()


> 
> Since I didn't find a cool shortcut I decided to use brute force:
> 
> $ cat pandas_explode_column.py
> import pandas as pd
> 
> df = pd.DataFrame(
>     [
>         [
>             "2019-06-21 11:15:00",
>             "WNEWSKI, Joan;#17226;#BALIN, Jock;#18139;#DUNE, Colem;#17230;"
>         ],
>         [
>             "2019-06-22 10:00:00", "Doe, John;#42;Robbins, Rita;"
>         ]
>     ],
>     columns=["Session date", "Consultant"]
> )
> 
> def explode_consultants(consultants):
>         consultants = (c.lstrip("#") for c in consultants.split(";"))
>         return (c for c in consultants if c.strip("0123456789"))
> 
> def explode_column(df, column, split):
>     for _index, row in df.iterrows():
>         for part in split(row[column]):
>             yield [part if c == column else row[c] for c in df.columns]
> 
> def explode(df, column, split):
>     return pd.DataFrame(
>         explode_column(df, "Consultant", split), columns=df.columns
>     )
> 
> df2 = explode(df, "Consultant", explode_consultants)
> 
> print(df)
> print(df2)
> $ python3 pandas_explode_column.py
>           Session date                                         Consultant
> 0  2019-06-21 11:15:00  WNEWSKI, Joan;#17226;#BALIN, Jock;#18139;#DUNE...
> 1  2019-06-22 10:00:00                       Doe, John;#42;Robbins, Rita;
> 
> [2 rows x 2 columns]
>           Session date     Consultant
> 0  2019-06-21 11:15:00  WNEWSKI, Joan
> 1  2019-06-21 11:15:00    BALIN, Jock
> 2  2019-06-21 11:15:00    DUNE, Colem
> 3  2019-06-22 10:00:00      Doe, John
> 4  2019-06-22 10:00:00  Robbins, Rita
> 
> [5 rows x 2 columns]
> $

Mind a little blown :-). Going to have to play and break this several times to fully get it. 

Thanks

Sayth