python-2.7


Pandas Aggregate/Group by based on most recent date


I have a DataFrame as follows, where Id is a string and Date is a datetime:
Id Date
1 3-1-2012
1 4-8-2013
2 1-17-2013
2 5-4-2013
2 10-30-2012
3 1-3-2013
I'd like to consolidate the table to just show one row for each Id which has the most recent date.
Any thoughts on how to do this?
You can groupby the Id field:
In [11]: df
Out[11]:
Id Date
0 1 2012-03-01 00:00:00
1 1 2013-04-08 00:00:00
2 2 2013-01-17 00:00:00
3 2 2013-05-04 00:00:00
4 2 2012-10-30 00:00:00
5 3 2013-01-03 00:00:00
In [12]: g = df.groupby('Id')
If you are not certain about the ordering, you could do something along the lines:
In [13]: g.agg(lambda x: x.iloc[x.Date.argmax()])
Out[13]:
Date
Id
1 2013-04-08 00:00:00
2 2013-05-04 00:00:00
3 2013-01-03 00:00:00
which for each group grabs the row with largest (latest) date (the argmax part).
If you knew they were in order you could take the last (or first) entry:
In [14]: g.last()
Out[14]:
Date
Id
1 2013-04-08 00:00:00
2 2012-10-30 00:00:00
3 2013-01-03 00:00:00
(Note: they're not in order, so this doesn't work in this case!)
In the Hayden response, I think that using x.loc in place of x.iloc is better, as the index of the df dataframe could be sparse (and in this case the iloc will not work).
(I haven't enought points on stackoverflow to post it in comments of the response).

Related Links

sqlalchemy insert data from dictionary
python 2.7 - trying to print a string and the (printed) output of function in the same line
create and use NumPy array with dtype of builtin int
Need help extracting links from a TD in webpage
Python Numpy Installation Windows 10 64-bit
What does python code do and what does xml code do in odoo?
Cython.Compiler.Errors.CompileError: hamilton_filter.pyx
MemoryError of Python Module dbf
Python 2.7, Tkinter and PIL, grab the screen and show the Image I grabbed when I click a button
log messages from non-default module not showing up in google app engine console
how to fix python, urlopen error [Errno 8], using eventlet green
Python Interpreter Installation
Link error with CUDA 7.5 in Windows 10 (from Theano project): MSVCRT.lib error LNK2019: unresolved external symbol
task queue in Appengine (using NDB) stopping another function from updating data
Refresh Oauth2 scope while using decorator in App Engine
Python process increasing memory RAM usage

Categories

HOME
apache-nifi
lambda
typelite
android-fragments
redmine
postgresql-9.4
transactions
message-queue
initialization
apk
typeerror
nsis
tcp
gallery
navigation
video-streaming
google-search-console
position
mithril.js
minecraft
mailing-list
stack-overflow
phpstorm-2017.1
business-intelligence
angular4
freertos
cuba-platform
pygobject
asciimath
playframework-2.0
scala-ide
robolectric
spring-ldap
workday
connection-refused
device
sybase-ase
chronicle-queue
environment
openbravo
echosign
annotation-processing
dotcms
python-2.x
rust-cargo
quickcheck
kendo-listview
busboy
xvfb
aurelia-cli
mv
klee
hypothesis-test
elasticsearch-aggregation
spooler
virtual-memory
dredd
redisson
winston
word-2007
dblink
predicates
front-camera
google-shopping-api
d3v4
annotatorjs
livecycle
fragmentstatepageradapter
adp
distributed-cache
cortana
odesk
juniper-network-connect
dnx50
netdatacontractserializer
baseadapter
microsoft-expression-web
qt5.4
box2dweb
git-reset
bittorrent-sync
windows-taskbar
django-sites
autonumber
jmenubar
ggts
non-ascii-characters
bcdedit
android-dialog
uitextfielddelegate
pcspim
affinetransform
azure-acs
fragment-identifier
tessellation
facebook-fbml
data-retrieval
external-sorting
fuzzy-comparison
word-automation
error-detection

Resources

Encrypt Message



code
soft
python
ios
c
html
jquery
cloud
mobile