python - Pandas Dataframe: Accessing via composite index created by groupby operation -
i want calculate group specific ratio gathered 2 datasets. 2 dataframes read database with
leases = pd.read_sql_query(sql, connection) sales = pd.read_sql_query(sql, connection)
one real estate offered sale, other rented objects. group both of them city , category i'm interested in:
leasegroups = leases.groupby(['idconjugate', "city"]) salegroups = sales.groupby(['idconjugate', "city"])
now want know ratio between cheapest rental object per category , city , expensively sold object obtain lower bound possible return:
minlease = leasegroups['price'].min() maxsale = salegroups['price'].max() ratios = minlease*12/maxsale
i output like: category - city: ratio cannot access ratio object city nor category. tried creating new dataframe with:
newframe = pd.dataframe({"minleases" : minlease,"maxsales" : maxsale,"ratios" : ratios}) newframe = newframe.loc[newframe['ratios'].notnull()]
which gives me correct rows, , newframe.index returns groups.
index.name gives ['idconjugate', 'city'] indexing results in keyerror. how can make index out of different groups: id0+city1, id0+city2 etc... ?
edit: output looks this:
maxsales minleases ratios idconjugate city 1 argeles gazost 59500 337 0.067966 chelles 129000 519 0.048279 enghien-les-bains 143000 696 0.058406 esbly 117990 495 0.050343 foix 58000 350 0.072414
the goal select top ratios , plot them bokeh, takes dataframe object , plots column versus index understand it:
topselect = ratio.loc[ratio["ratios"] > ratio["ratios"].quantile(quant)] dots = dot(topselect, values='ratios', label=topselect.index, tools=[hover,], title="{}% best minimal lease/sale ratios per city , group".format(topperc*100), width=600)
i needed index list in original order, following worked:
ids = [] cities = [] l in topselect.index: ids.append(str(int(l[0]))) cities.append(l[1]) newind = [i+"_"+j i,j in zip(ids, cities)] topselect.index = newind
now plot shows 1_city1 ... 1_city2 ... n_cityx on x-axis. figure there must obvious way inside pandas framework i'm missing.
Comments
Post a Comment