Pandas Aggregate/Group by based on most recent date
I have a DataFrame as follows, where Id is a string and Date is a datetime: Id Date 1 3-1-2012 1 4-8-2013 2 1-17-2013 2 5-4-2013 2 10-30-2012 3 1-3-2013 I'd like to consolidate the table to just show one row for each Id which has the most recent date. Any thoughts on how to do this?
You can groupby the Id field: In : df Out: Id Date 0 1 2012-03-01 00:00:00 1 1 2013-04-08 00:00:00 2 2 2013-01-17 00:00:00 3 2 2013-05-04 00:00:00 4 2 2012-10-30 00:00:00 5 3 2013-01-03 00:00:00 In : g = df.groupby('Id') If you are not certain about the ordering, you could do something along the lines: In : g.agg(lambda x: x.iloc[x.Date.argmax()]) Out: Date Id 1 2013-04-08 00:00:00 2 2013-05-04 00:00:00 3 2013-01-03 00:00:00 which for each group grabs the row with largest (latest) date (the argmax part). If you knew they were in order you could take the last (or first) entry: In : g.last() Out: Date Id 1 2013-04-08 00:00:00 2 2012-10-30 00:00:00 3 2013-01-03 00:00:00 (Note: they're not in order, so this doesn't work in this case!)
In the Hayden response, I think that using x.loc in place of x.iloc is better, as the index of the df dataframe could be sparse (and in this case the iloc will not work). (I haven't enought points on stackoverflow to post it in comments of the response).
sqlalchemy insert data from dictionary
python 2.7 - trying to print a string and the (printed) output of function in the same line
create and use NumPy array with dtype of builtin int
Need help extracting links from a TD in webpage
Python Numpy Installation Windows 10 64-bit
What does python code do and what does xml code do in odoo?
MemoryError of Python Module dbf
Python 2.7, Tkinter and PIL, grab the screen and show the Image I grabbed when I click a button
log messages from non-default module not showing up in google app engine console
how to fix python, urlopen error [Errno 8], using eventlet green
Python Interpreter Installation
Link error with CUDA 7.5 in Windows 10 (from Theano project): MSVCRT.lib error LNK2019: unresolved external symbol
task queue in Appengine (using NDB) stopping another function from updating data
Refresh Oauth2 scope while using decorator in App Engine
Python process increasing memory RAM usage