import pandas as pd
Date and time variables in Pandas
23 March 2019
This notebook collects my current understanding of how to work with time variables in Pandas
References: - http://strftime.org/ - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html - https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
Intro
Let’s start with some Pandas series that consist of strings with dates and times in different formats
= pd.Series(['3/5/1972', '24/5/1984', '1/11/1977'])
dates1 = pd.Series(['17 May 1982', '12 June 2016'])
dates2 = pd.Series(['23/5/1984 17:31:07'])
dates3 = pd.Series(['8 November 1948', '3/6/1983'])
dates4 = pd.Series(['12 Nov 1912', '8 July 1723', '3/7/1982', '12 May, 1932']) dates5
Now we convert each of these to a Pandas datetime series using the to_datetime method. We specify the input along with the format that the datetime values are expressed in.
= pd.to_datetime(dates1, format="%d/%m/%Y") # day, month, year
dates1_pd = pd.to_datetime(dates2, format="%d %B %Y") # day, month name, year
dates2_pd # day, month year and hour, minute, seconds
= pd.to_datetime(dates3, format="%d/%m/%Y %H:%M:%S")
dates3_pd # infer the datetime format based on the structure of the input string
= pd.to_datetime(dates4, infer_datetime_format=True, dayfirst=True)
dates4_pd = pd.to_datetime(dates5, infer_datetime_format=True, dayfirst=True)
dates5_pd
print(dates1_pd)
print(dates2_pd)
print(dates3_pd)
print(dates4_pd)
print(dates5_pd)
0 1972-05-03
1 1984-05-24
2 1977-11-01
dtype: datetime64[ns]
0 1982-05-17
1 2016-06-12
dtype: datetime64[ns]
0 1984-05-23 17:31:07
dtype: datetime64[ns]
0 1948-11-08
1 1983-06-03
dtype: datetime64[ns]
0 1912-11-12
1 1723-07-08
2 1982-07-03
3 1932-05-12
dtype: datetime64[ns]
Working with Timestamps and extracting time information
Rather than writing a datetime as a string then converting it to a datetime type, we can create a series of datetime values directly using the Timestamp function. We can specifiy the year, month, day, hour, minute and second.
Below we do this using keyword arguments.
= pd.Series([pd.Timestamp(year=1972, day=3, month=5), pd.Timestamp(year=1984, month=5, day=24),
dates6 =1977, month=11, day=1)]) pd.Timestamp(year
Here we specify a datetime value another way, without used keyword arguments.
1984, 5, 24, 9, 45, 21) pd.Timestamp(
Timestamp('1984-05-24 09:45:21')
The date_range method allows us to create a series of datetime values between a start and end date with a specified frequency, e.g., days, months, hours etc. Let’s a series consisting of all days between the release of Ghostbusters and today’s date.
= pd.date_range(start='11/15/1984', end='23/03/2019', freq='D') date_range_egon
date_range_egon
DatetimeIndex(['1984-11-15', '1984-11-16', '1984-11-17', '1984-11-18',
'1984-11-19', '1984-11-20', '1984-11-21', '1984-11-22',
'1984-11-23', '1984-11-24',
...
'2019-03-14', '2019-03-15', '2019-03-16', '2019-03-17',
'2019-03-18', '2019-03-19', '2019-03-20', '2019-03-21',
'2019-03-22', '2019-03-23'],
dtype='datetime64[ns]', length=12547, freq='D')
We can put this data into a data frame and then extract the day and month names, using the day_name() and month_name() methods.
# convert to Pandas dataframe
= pd.DataFrame(data={'Date': date_range_egon, 'Day': date_range_egon.day_name(),
df 'Month': date_range_egon.month_name()})
It’s sometimes useful to set the index of the dataframe to the datetime value. This then allows us to slice the dataframe based on an interval of date values.
# set the index to datetime value
= df.set_index('Date')
df
# slice the dataframe based on time period
'1987-12-01':'1987-12-24'] df[
Day | Month | |
---|---|---|
Date | ||
1987-12-01 | Tuesday | December |
1987-12-02 | Wednesday | December |
1987-12-03 | Thursday | December |
1987-12-04 | Friday | December |
1987-12-05 | Saturday | December |
1987-12-06 | Sunday | December |
1987-12-07 | Monday | December |
1987-12-08 | Tuesday | December |
1987-12-09 | Wednesday | December |
1987-12-10 | Thursday | December |
1987-12-11 | Friday | December |
1987-12-12 | Saturday | December |
1987-12-13 | Sunday | December |
1987-12-14 | Monday | December |
1987-12-15 | Tuesday | December |
1987-12-16 | Wednesday | December |
1987-12-17 | Thursday | December |
1987-12-18 | Friday | December |
1987-12-19 | Saturday | December |
1987-12-20 | Sunday | December |
1987-12-21 | Monday | December |
1987-12-22 | Tuesday | December |
1987-12-23 | Wednesday | December |
1987-12-24 | Thursday | December |
So which day of the week was Ghostbusters released on?
'1984-11-15':'1984-11-15'] df[
Day | Month | |
---|---|---|
Date | ||
1984-11-15 | Thursday | November |
Using accessors to extract days, years and months
Once a field has been converted to a datetime series, we can extract the day, year, month etc using accessor methods. I’m still learning about these. A nice explanation is available here: https://realpython.com/python-pandas-tricks/
# given a dataframe of datetime like values...
= pd.to_datetime(dates3, format="%d/%m/%Y %H:%M:%S")
dt1
# extract the time components using the dt accessor
print(dt1)
print(dt1.dt.minute)
print(dt1.dt.hour)
print(dt1.dt.day)
print(dt1.dt.day_name())
print(dt1.dt.month)
print(dt1.dt.month_name())
print(dt1.dt.year)
print(dt1.dt.quarter)
0 1984-05-23 17:31:07
dtype: datetime64[ns]
0 31
dtype: int64
0 17
dtype: int64
0 23
dtype: int64
0 Wednesday
dtype: object
0 5
dtype: int64
0 May
dtype: object
0 1984
dtype: int64
0 2
dtype: int64
Time differences
Taking the difference between two Timestamp objects returns a Timedelta object.
# the number of days between when the date that Ghostbusters was released and today
= pd.Timestamp(2019, 3, 23)
time1 = pd.Timestamp(1984, 11, 15)
time2
# returns a Timedelta object
= (time1 - time2) time_delta1
# how many days difference. Returns an integer
print(time_delta1.days)
# how many years difference?
print(time_delta1.days / 365)
12546
34.37260273972603
Timezones
Finally let’s look at timezones.
from pytz import all_timezones
# All the different timezones we can use
0:390] all_timezones[
['Africa/Abidjan',
'Africa/Accra',
'Africa/Addis_Ababa',
'Africa/Algiers',
'Africa/Asmara',
'Africa/Asmera',
'Africa/Bamako',
'Africa/Bangui',
'Africa/Banjul',
'Africa/Bissau',
'Africa/Blantyre',
'Africa/Brazzaville',
'Africa/Bujumbura',
'Africa/Cairo',
'Africa/Casablanca',
'Africa/Ceuta',
'Africa/Conakry',
'Africa/Dakar',
'Africa/Dar_es_Salaam',
'Africa/Djibouti',
'Africa/Douala',
'Africa/El_Aaiun',
'Africa/Freetown',
'Africa/Gaborone',
'Africa/Harare',
'Africa/Johannesburg',
'Africa/Juba',
'Africa/Kampala',
'Africa/Khartoum',
'Africa/Kigali',
'Africa/Kinshasa',
'Africa/Lagos',
'Africa/Libreville',
'Africa/Lome',
'Africa/Luanda',
'Africa/Lubumbashi',
'Africa/Lusaka',
'Africa/Malabo',
'Africa/Maputo',
'Africa/Maseru',
'Africa/Mbabane',
'Africa/Mogadishu',
'Africa/Monrovia',
'Africa/Nairobi',
'Africa/Ndjamena',
'Africa/Niamey',
'Africa/Nouakchott',
'Africa/Ouagadougou',
'Africa/Porto-Novo',
'Africa/Sao_Tome',
'Africa/Timbuktu',
'Africa/Tripoli',
'Africa/Tunis',
'Africa/Windhoek',
'America/Adak',
'America/Anchorage',
'America/Anguilla',
'America/Antigua',
'America/Araguaina',
'America/Argentina/Buenos_Aires',
'America/Argentina/Catamarca',
'America/Argentina/ComodRivadavia',
'America/Argentina/Cordoba',
'America/Argentina/Jujuy',
'America/Argentina/La_Rioja',
'America/Argentina/Mendoza',
'America/Argentina/Rio_Gallegos',
'America/Argentina/Salta',
'America/Argentina/San_Juan',
'America/Argentina/San_Luis',
'America/Argentina/Tucuman',
'America/Argentina/Ushuaia',
'America/Aruba',
'America/Asuncion',
'America/Atikokan',
'America/Atka',
'America/Bahia',
'America/Bahia_Banderas',
'America/Barbados',
'America/Belem',
'America/Belize',
'America/Blanc-Sablon',
'America/Boa_Vista',
'America/Bogota',
'America/Boise',
'America/Buenos_Aires',
'America/Cambridge_Bay',
'America/Campo_Grande',
'America/Cancun',
'America/Caracas',
'America/Catamarca',
'America/Cayenne',
'America/Cayman',
'America/Chicago',
'America/Chihuahua',
'America/Coral_Harbour',
'America/Cordoba',
'America/Costa_Rica',
'America/Creston',
'America/Cuiaba',
'America/Curacao',
'America/Danmarkshavn',
'America/Dawson',
'America/Dawson_Creek',
'America/Denver',
'America/Detroit',
'America/Dominica',
'America/Edmonton',
'America/Eirunepe',
'America/El_Salvador',
'America/Ensenada',
'America/Fort_Nelson',
'America/Fort_Wayne',
'America/Fortaleza',
'America/Glace_Bay',
'America/Godthab',
'America/Goose_Bay',
'America/Grand_Turk',
'America/Grenada',
'America/Guadeloupe',
'America/Guatemala',
'America/Guayaquil',
'America/Guyana',
'America/Halifax',
'America/Havana',
'America/Hermosillo',
'America/Indiana/Indianapolis',
'America/Indiana/Knox',
'America/Indiana/Marengo',
'America/Indiana/Petersburg',
'America/Indiana/Tell_City',
'America/Indiana/Vevay',
'America/Indiana/Vincennes',
'America/Indiana/Winamac',
'America/Indianapolis',
'America/Inuvik',
'America/Iqaluit',
'America/Jamaica',
'America/Jujuy',
'America/Juneau',
'America/Kentucky/Louisville',
'America/Kentucky/Monticello',
'America/Knox_IN',
'America/Kralendijk',
'America/La_Paz',
'America/Lima',
'America/Los_Angeles',
'America/Louisville',
'America/Lower_Princes',
'America/Maceio',
'America/Managua',
'America/Manaus',
'America/Marigot',
'America/Martinique',
'America/Matamoros',
'America/Mazatlan',
'America/Mendoza',
'America/Menominee',
'America/Merida',
'America/Metlakatla',
'America/Mexico_City',
'America/Miquelon',
'America/Moncton',
'America/Monterrey',
'America/Montevideo',
'America/Montreal',
'America/Montserrat',
'America/Nassau',
'America/New_York',
'America/Nipigon',
'America/Nome',
'America/Noronha',
'America/North_Dakota/Beulah',
'America/North_Dakota/Center',
'America/North_Dakota/New_Salem',
'America/Ojinaga',
'America/Panama',
'America/Pangnirtung',
'America/Paramaribo',
'America/Phoenix',
'America/Port-au-Prince',
'America/Port_of_Spain',
'America/Porto_Acre',
'America/Porto_Velho',
'America/Puerto_Rico',
'America/Punta_Arenas',
'America/Rainy_River',
'America/Rankin_Inlet',
'America/Recife',
'America/Regina',
'America/Resolute',
'America/Rio_Branco',
'America/Rosario',
'America/Santa_Isabel',
'America/Santarem',
'America/Santiago',
'America/Santo_Domingo',
'America/Sao_Paulo',
'America/Scoresbysund',
'America/Shiprock',
'America/Sitka',
'America/St_Barthelemy',
'America/St_Johns',
'America/St_Kitts',
'America/St_Lucia',
'America/St_Thomas',
'America/St_Vincent',
'America/Swift_Current',
'America/Tegucigalpa',
'America/Thule',
'America/Thunder_Bay',
'America/Tijuana',
'America/Toronto',
'America/Tortola',
'America/Vancouver',
'America/Virgin',
'America/Whitehorse',
'America/Winnipeg',
'America/Yakutat',
'America/Yellowknife',
'Antarctica/Casey',
'Antarctica/Davis',
'Antarctica/DumontDUrville',
'Antarctica/Macquarie',
'Antarctica/Mawson',
'Antarctica/McMurdo',
'Antarctica/Palmer',
'Antarctica/Rothera',
'Antarctica/South_Pole',
'Antarctica/Syowa',
'Antarctica/Troll',
'Antarctica/Vostok',
'Arctic/Longyearbyen',
'Asia/Aden',
'Asia/Almaty',
'Asia/Amman',
'Asia/Anadyr',
'Asia/Aqtau',
'Asia/Aqtobe',
'Asia/Ashgabat',
'Asia/Ashkhabad',
'Asia/Atyrau',
'Asia/Baghdad',
'Asia/Bahrain',
'Asia/Baku',
'Asia/Bangkok',
'Asia/Barnaul',
'Asia/Beirut',
'Asia/Bishkek',
'Asia/Brunei',
'Asia/Calcutta',
'Asia/Chita',
'Asia/Choibalsan',
'Asia/Chongqing',
'Asia/Chungking',
'Asia/Colombo',
'Asia/Dacca',
'Asia/Damascus',
'Asia/Dhaka',
'Asia/Dili',
'Asia/Dubai',
'Asia/Dushanbe',
'Asia/Famagusta',
'Asia/Gaza',
'Asia/Harbin',
'Asia/Hebron',
'Asia/Ho_Chi_Minh',
'Asia/Hong_Kong',
'Asia/Hovd',
'Asia/Irkutsk',
'Asia/Istanbul',
'Asia/Jakarta',
'Asia/Jayapura',
'Asia/Jerusalem',
'Asia/Kabul',
'Asia/Kamchatka',
'Asia/Karachi',
'Asia/Kashgar',
'Asia/Kathmandu',
'Asia/Katmandu',
'Asia/Khandyga',
'Asia/Kolkata',
'Asia/Krasnoyarsk',
'Asia/Kuala_Lumpur',
'Asia/Kuching',
'Asia/Kuwait',
'Asia/Macao',
'Asia/Macau',
'Asia/Magadan',
'Asia/Makassar',
'Asia/Manila',
'Asia/Muscat',
'Asia/Nicosia',
'Asia/Novokuznetsk',
'Asia/Novosibirsk',
'Asia/Omsk',
'Asia/Oral',
'Asia/Phnom_Penh',
'Asia/Pontianak',
'Asia/Pyongyang',
'Asia/Qatar',
'Asia/Qyzylorda',
'Asia/Rangoon',
'Asia/Riyadh',
'Asia/Saigon',
'Asia/Sakhalin',
'Asia/Samarkand',
'Asia/Seoul',
'Asia/Shanghai',
'Asia/Singapore',
'Asia/Srednekolymsk',
'Asia/Taipei',
'Asia/Tashkent',
'Asia/Tbilisi',
'Asia/Tehran',
'Asia/Tel_Aviv',
'Asia/Thimbu',
'Asia/Thimphu',
'Asia/Tokyo',
'Asia/Tomsk',
'Asia/Ujung_Pandang',
'Asia/Ulaanbaatar',
'Asia/Ulan_Bator',
'Asia/Urumqi',
'Asia/Ust-Nera',
'Asia/Vientiane',
'Asia/Vladivostok',
'Asia/Yakutsk',
'Asia/Yangon',
'Asia/Yekaterinburg',
'Asia/Yerevan',
'Atlantic/Azores',
'Atlantic/Bermuda',
'Atlantic/Canary',
'Atlantic/Cape_Verde',
'Atlantic/Faeroe',
'Atlantic/Faroe',
'Atlantic/Jan_Mayen',
'Atlantic/Madeira',
'Atlantic/Reykjavik',
'Atlantic/South_Georgia',
'Atlantic/St_Helena',
'Atlantic/Stanley',
'Australia/ACT',
'Australia/Adelaide',
'Australia/Brisbane',
'Australia/Broken_Hill',
'Australia/Canberra',
'Australia/Currie',
'Australia/Darwin',
'Australia/Eucla',
'Australia/Hobart',
'Australia/LHI',
'Australia/Lindeman',
'Australia/Lord_Howe',
'Australia/Melbourne',
'Australia/NSW',
'Australia/North',
'Australia/Perth',
'Australia/Queensland',
'Australia/South',
'Australia/Sydney',
'Australia/Tasmania',
'Australia/Victoria',
'Australia/West',
'Australia/Yancowinna',
'Brazil/Acre',
'Brazil/DeNoronha',
'Brazil/East',
'Brazil/West',
'CET',
'CST6CDT',
'Canada/Atlantic',
'Canada/Central',
'Canada/Eastern',
'Canada/Mountain',
'Canada/Newfoundland',
'Canada/Pacific',
'Canada/Saskatchewan',
'Canada/Yukon',
'Chile/Continental',
'Chile/EasterIsland',
'Cuba',
'EET',
'EST',
'EST5EDT',
'Egypt',
'Eire',
'Etc/GMT',
'Etc/GMT+0']
If it’s midday in Adelaide, what time is it in Buenos Aires?
# define the midday time in Adelaide and 'localize' it to the Adelaide timezone
= pd.Timestamp(year=2019, month=3, day=23, hour=12)
time_adelaide_midday = time_adelaide_midday.tz_localize('Australia/Adelaide') time_adelaide_midday
# convert from one timezone to another
= time_adelaide_midday.tz_convert('America/Argentina/Buenos_Aires')
time_adelaide_midday_BA print(time_adelaide_midday_BA)
2019-03-22 22:30:00-03:00
# Now suppose we have a series in UTC
= pd.date_range('2019/02/15', '2019/03/17', tz='UTC')
times
# What are these values in Adelaide time?
'Australia/Adelaide') times.tz_convert(
DatetimeIndex(['2019-02-15 10:30:00+10:30', '2019-02-16 10:30:00+10:30',
'2019-02-17 10:30:00+10:30', '2019-02-18 10:30:00+10:30',
'2019-02-19 10:30:00+10:30', '2019-02-20 10:30:00+10:30',
'2019-02-21 10:30:00+10:30', '2019-02-22 10:30:00+10:30',
'2019-02-23 10:30:00+10:30', '2019-02-24 10:30:00+10:30',
'2019-02-25 10:30:00+10:30', '2019-02-26 10:30:00+10:30',
'2019-02-27 10:30:00+10:30', '2019-02-28 10:30:00+10:30',
'2019-03-01 10:30:00+10:30', '2019-03-02 10:30:00+10:30',
'2019-03-03 10:30:00+10:30', '2019-03-04 10:30:00+10:30',
'2019-03-05 10:30:00+10:30', '2019-03-06 10:30:00+10:30',
'2019-03-07 10:30:00+10:30', '2019-03-08 10:30:00+10:30',
'2019-03-09 10:30:00+10:30', '2019-03-10 10:30:00+10:30',
'2019-03-11 10:30:00+10:30', '2019-03-12 10:30:00+10:30',
'2019-03-13 10:30:00+10:30', '2019-03-14 10:30:00+10:30',
'2019-03-15 10:30:00+10:30', '2019-03-16 10:30:00+10:30',
'2019-03-17 10:30:00+10:30'],
dtype='datetime64[ns, Australia/Adelaide]', freq='D')