Анализ или раскрытие списка словарей в фрейме данных

У меня есть фрейм данных со списками вложенных словарей, которые нужно распаковать.

Мне нужно получить дату и цену из priceHistory и товаров, перечисленных в WaterConservation и EnergyEfficient. Приведенный ниже пример представляет собой всего лишь две строки гораздо большего фрейма данных, где в каждой строке фрейма данных не одинаковое количество элементов словаря.

df = pd.DataFrame(
    [[19, [{'priceChangeRate': 0, 'date': '2015-05-29', 'source': 'Public Record', 'postingIsRental': False, 'time': 1432857600000, 'sellerAgent': None, 'showCountyLink': False, 'attributeSource': {'infoString2': 'Public Record', 'infoString3': None, 'infoString1': None}, 'pricePerSquareFoot': 275, 'buyerAgent': None, 'event': 'Sold', 'price': 877205}], ['Low flow commode', 'Low flow fixtures', 'Water-Smart Landscaping'],''],
     [89, [{'priceChangeRate': 0.090909090909091, 'date': '2023-07-14', 'source': 'Public Record', 'postingIsRental': False, 'time': 1689292800000, 'sellerAgent': {'name': 'seller1', 'photo': {'url': 'https://sellerphoto1.jpg'}, 'profileUrl': '/profile/sellerprofile1/'}, 'showCountyLink': False, 'attributeSource': {'infoString2': 'Public Record', 'infoString3': None, 'infoString1': None}, 'pricePerSquareFoot': 308, 'buyerAgent': {'name': 'buyer1', 'photo': {'url': 'https://buyerphoto1.jpg'}, 'profileUrl': '/profile/buyerprofile1/'}, 'event': 'Sold', 'price': 1200000}, {'priceChangeRate': 0, 'date': '2015-08-20', 'source': 'Public Record', 'postingIsRental': False, 'time': 1440028800000, 'sellerAgent': None, 'showCountyLink': False, 'attributeSource': {'infoString2': 'Public Record', 'infoString3': None, 'infoString1': None}, 'pricePerSquareFoot': 50, 'buyerAgent': None, 'event': 'Sold', 'price': 195000}],'', ['Windows', 'Insulation', 'HVAC', 'Appliances', 'Lighting']]],
    columns=['id', 'priceHistory', 'WaterConservation', 'EnergyEfficient'])

Я перепробовал слишком много вещей, чтобы перечислять их здесь, но этот кажется наиболее эффективным (просто чтобы получить priceHistory) (источник):

df = pd.concat(
    [
        df,
        df.pop("priceHistory").apply(
            lambda x: pd.Series({k: v for d in x for k, v in d.items()})
        ),
    ],
    axis=1,
)
print(df)

Но я получаю эту ошибку: TypeError: объект «float» не повторяется

каков ваш ожидаемый результат

— 13.08.2024 02:25

@iBeMeltin, мне нужны данные в списках для дальнейшего анализа. Я должен был это указать в исходном посте. Решение, опубликованное ниже, работает для моих нужд.

— 13.08.2024 15:27

pandas dataframe nested pandas-explode

13.08.2024 01:33

Learning Data Analytics Two: Filtering data in a DataFrame

В Learning Data Analytics One: Using Python and Pandas , я рассказываю о:

Сиборн не любит даты: вместо них используются ординалы дат

Перейти к ответу Данный вопрос помечен как решенный

Ответы 2

Вы можете использовать pd.json_normalize, чтобы получить информацию о дате и цене от priceHistory. Затем оставьте столбец priceHistory, если он вам не нужен, и присоедините data и price к основному df.

Например:

import pandas as pd

df = pd.DataFrame(
    [[19, [{'priceChangeRate': 0, 'date': '2015-05-29', 'source': 'Public Record', 'postingIsRental': False, 'time': 1432857600000, 'sellerAgent': None, 'showCountyLink': False, 'attributeSource': {'infoString2': 'Public Record', 'infoString3': None, 'infoString1': None}, 'pricePerSquareFoot': 275, 'buyerAgent': None, 'event': 'Sold', 'price': 877205}], ['Low flow commode', 'Low flow fixtures', 'Water-Smart Landscaping'],''],
     [89, [{'priceChangeRate': 0.090909090909091, 'date': '2023-07-14', 'source': 'Public Record', 'postingIsRental': False, 'time': 1689292800000, 'sellerAgent': {'name': 'seller1', 'photo': {'url': 'https://sellerphoto1.jpg'}, 'profileUrl': '/profile/sellerprofile1/'}, 'showCountyLink': False, 'attributeSource': {'infoString2': 'Public Record', 'infoString3': None, 'infoString1': None}, 'pricePerSquareFoot': 308, 'buyerAgent': {'name': 'buyer1', 'photo': {'url': 'https://buyerphoto1.jpg'}, 'profileUrl': '/profile/buyerprofile1/'}, 'event': 'Sold', 'price': 1200000}, {'priceChangeRate': 0, 'date': '2015-08-20', 'source': 'Public Record', 'postingIsRental': False, 'time': 1440028800000, 'sellerAgent': None, 'showCountyLink': False, 'attributeSource': {'infoString2': 'Public Record', 'infoString3': None, 'infoString1': None}, 'pricePerSquareFoot': 50, 'buyerAgent': None, 'event': 'Sold', 'price': 195000}],'', ['Windows', 'Insulation', 'HVAC', 'Appliances', 'Lighting']]],
    columns=['id', 'priceHistory', 'WaterConservation', 'EnergyEfficient'])

price_history_df = pd.json_normalize(df['priceHistory'].explode().tolist(), sep='_')

df = df.drop('priceHistory', axis=1).join(price_history_df[['date', 'price']], how='left')

13.08.2024 11:43

Ответ принят как подходящий

Вы можете использовать Series.explode с json_normalize , создать тот же индекс с помощью DataFrame.set_index , поэтому можно использовать DataFrame.join:

s = df.pop('priceHistory').explode()
out = df.join(pd.json_normalize(s).set_index(s.index))

print (out)
   id                                  WaterConservation  \
0  19  [Low flow commode, Low flow fixtures, Water-Sm...   
1  89                                                      
1  89                                                      

                                     EnergyEfficient  priceChangeRate  \
0                                                            0.000000   
1  [Windows, Insulation, HVAC, Appliances, Lighting]         0.090909   
1  [Windows, Insulation, HVAC, Appliances, Lighting]         0.000000   

         date         source  postingIsRental           time  sellerAgent  \
0  2015-05-29  Public Record            False  1432857600000          NaN   
1  2023-07-14  Public Record            False  1689292800000          NaN   
1  2015-08-20  Public Record            False  1440028800000          NaN   

   showCountyLink  pricePerSquareFoot  buyerAgent event    price  \
0           False                 275         NaN  Sold   877205   
1           False                 308         NaN  Sold  1200000   
1           False                  50         NaN  Sold   195000   

  attributeSource.infoString2 attributeSource.infoString3  \
0               Public Record                        None   
1               Public Record                        None   
1               Public Record                        None   

  attributeSource.infoString1 sellerAgent.name     sellerAgent.photo.url  \
0                        None              NaN                       NaN   
1                        None          seller1  https://sellerphoto1.jpg   
1                        None              NaN                       NaN   

     sellerAgent.profileUrl buyerAgent.name     buyerAgent.photo.url  \
0                       NaN             NaN                      NaN   
1  /profile/sellerprofile1/          buyer1  https://buyerphoto1.jpg   
1                       NaN             NaN                      NaN   

     buyerAgent.profileUrl  
0                      NaN  
1  /profile/buyerprofile1/  
1                      NaN

Спасибо @jezrael, это работает для того, что мне нужно.

— 13.08.2024 15:26

13.08.2024 11:49