Hosted with nbsanity. See source notebook on GitHub.

Date and time variables in Pandas

23 March 2019

This notebook collects my current understanding of how to work with time variables in Pandas

References: - http://strftime.org/ - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html - https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

import pandas as pd

Intro

Let’s start with some Pandas series that consist of strings with dates and times in different formats

dates1 = pd.Series(['3/5/1972', '24/5/1984', '1/11/1977'])
dates2 = pd.Series(['17 May 1982', '12 June 2016'])
dates3 = pd.Series(['23/5/1984 17:31:07'])
dates4 = pd.Series(['8 November 1948', '3/6/1983'])
dates5 = pd.Series(['12 Nov 1912', '8 July 1723', '3/7/1982', '12   May, 1932'])

Now we convert each of these to a Pandas datetime series using the to_datetime method. We specify the input along with the format that the datetime values are expressed in.

dates1_pd = pd.to_datetime(dates1, format="%d/%m/%Y")  # day, month, year
dates2_pd = pd.to_datetime(dates2, format="%d %B %Y")  # day, month name, year
# day, month year and hour, minute, seconds
dates3_pd = pd.to_datetime(dates3, format="%d/%m/%Y %H:%M:%S")
# infer the datetime format based on the structure of the input string
dates4_pd = pd.to_datetime(dates4, infer_datetime_format=True, dayfirst=True)
dates5_pd = pd.to_datetime(dates5, infer_datetime_format=True, dayfirst=True)

print(dates1_pd)
print(dates2_pd)
print(dates3_pd)
print(dates4_pd)
print(dates5_pd)
0   1972-05-03
1   1984-05-24
2   1977-11-01
dtype: datetime64[ns]
0   1982-05-17
1   2016-06-12
dtype: datetime64[ns]
0   1984-05-23 17:31:07
dtype: datetime64[ns]
0   1948-11-08
1   1983-06-03
dtype: datetime64[ns]
0   1912-11-12
1   1723-07-08
2   1982-07-03
3   1932-05-12
dtype: datetime64[ns]

Working with Timestamps and extracting time information

Rather than writing a datetime as a string then converting it to a datetime type, we can create a series of datetime values directly using the Timestamp function. We can specifiy the year, month, day, hour, minute and second.

Below we do this using keyword arguments.

dates6 = pd.Series([pd.Timestamp(year=1972, day=3, month=5), pd.Timestamp(year=1984, month=5, day=24), 
               pd.Timestamp(year=1977, month=11, day=1)])

Here we specify a datetime value another way, without used keyword arguments.

pd.Timestamp(1984, 5, 24, 9, 45, 21)
Timestamp('1984-05-24 09:45:21')

The date_range method allows us to create a series of datetime values between a start and end date with a specified frequency, e.g., days, months, hours etc. Let’s a series consisting of all days between the release of Ghostbusters and today’s date.

date_range_egon = pd.date_range(start='11/15/1984', end='23/03/2019', freq='D')
date_range_egon
DatetimeIndex(['1984-11-15', '1984-11-16', '1984-11-17', '1984-11-18',
               '1984-11-19', '1984-11-20', '1984-11-21', '1984-11-22',
               '1984-11-23', '1984-11-24',
               ...
               '2019-03-14', '2019-03-15', '2019-03-16', '2019-03-17',
               '2019-03-18', '2019-03-19', '2019-03-20', '2019-03-21',
               '2019-03-22', '2019-03-23'],
              dtype='datetime64[ns]', length=12547, freq='D')

We can put this data into a data frame and then extract the day and month names, using the day_name() and month_name() methods.

# convert to Pandas dataframe
df = pd.DataFrame(data={'Date': date_range_egon, 'Day': date_range_egon.day_name(), 
                        'Month': date_range_egon.month_name()})

It’s sometimes useful to set the index of the dataframe to the datetime value. This then allows us to slice the dataframe based on an interval of date values.

# set the index to datetime value
df = df.set_index('Date')

# slice the dataframe based on time period
df['1987-12-01':'1987-12-24']
Day Month
Date
1987-12-01 Tuesday December
1987-12-02 Wednesday December
1987-12-03 Thursday December
1987-12-04 Friday December
1987-12-05 Saturday December
1987-12-06 Sunday December
1987-12-07 Monday December
1987-12-08 Tuesday December
1987-12-09 Wednesday December
1987-12-10 Thursday December
1987-12-11 Friday December
1987-12-12 Saturday December
1987-12-13 Sunday December
1987-12-14 Monday December
1987-12-15 Tuesday December
1987-12-16 Wednesday December
1987-12-17 Thursday December
1987-12-18 Friday December
1987-12-19 Saturday December
1987-12-20 Sunday December
1987-12-21 Monday December
1987-12-22 Tuesday December
1987-12-23 Wednesday December
1987-12-24 Thursday December

So which day of the week was Ghostbusters released on?

df['1984-11-15':'1984-11-15']
Day Month
Date
1984-11-15 Thursday November

Using accessors to extract days, years and months

Once a field has been converted to a datetime series, we can extract the day, year, month etc using accessor methods. I’m still learning about these. A nice explanation is available here: https://realpython.com/python-pandas-tricks/

# given a dataframe of datetime like values...
dt1 = pd.to_datetime(dates3, format="%d/%m/%Y %H:%M:%S")

# extract the time components using the dt accessor
print(dt1)
print(dt1.dt.minute)
print(dt1.dt.hour)
print(dt1.dt.day)
print(dt1.dt.day_name())
print(dt1.dt.month)
print(dt1.dt.month_name())
print(dt1.dt.year)
print(dt1.dt.quarter)
0   1984-05-23 17:31:07
dtype: datetime64[ns]
0    31
dtype: int64
0    17
dtype: int64
0    23
dtype: int64
0    Wednesday
dtype: object
0    5
dtype: int64
0    May
dtype: object
0    1984
dtype: int64
0    2
dtype: int64

Time differences

Taking the difference between two Timestamp objects returns a Timedelta object.

# the number of days between when the date that Ghostbusters was released and today
time1 = pd.Timestamp(2019, 3, 23)
time2 = pd.Timestamp(1984, 11, 15)

# returns a Timedelta object
time_delta1 = (time1 - time2)
# how many days difference. Returns an integer
print(time_delta1.days)

# how many years difference?
print(time_delta1.days / 365)
12546
34.37260273972603

Timezones

Finally let’s look at timezones.

from pytz import all_timezones
# All the different timezones we can use
all_timezones[0:390]
['Africa/Abidjan',
 'Africa/Accra',
 'Africa/Addis_Ababa',
 'Africa/Algiers',
 'Africa/Asmara',
 'Africa/Asmera',
 'Africa/Bamako',
 'Africa/Bangui',
 'Africa/Banjul',
 'Africa/Bissau',
 'Africa/Blantyre',
 'Africa/Brazzaville',
 'Africa/Bujumbura',
 'Africa/Cairo',
 'Africa/Casablanca',
 'Africa/Ceuta',
 'Africa/Conakry',
 'Africa/Dakar',
 'Africa/Dar_es_Salaam',
 'Africa/Djibouti',
 'Africa/Douala',
 'Africa/El_Aaiun',
 'Africa/Freetown',
 'Africa/Gaborone',
 'Africa/Harare',
 'Africa/Johannesburg',
 'Africa/Juba',
 'Africa/Kampala',
 'Africa/Khartoum',
 'Africa/Kigali',
 'Africa/Kinshasa',
 'Africa/Lagos',
 'Africa/Libreville',
 'Africa/Lome',
 'Africa/Luanda',
 'Africa/Lubumbashi',
 'Africa/Lusaka',
 'Africa/Malabo',
 'Africa/Maputo',
 'Africa/Maseru',
 'Africa/Mbabane',
 'Africa/Mogadishu',
 'Africa/Monrovia',
 'Africa/Nairobi',
 'Africa/Ndjamena',
 'Africa/Niamey',
 'Africa/Nouakchott',
 'Africa/Ouagadougou',
 'Africa/Porto-Novo',
 'Africa/Sao_Tome',
 'Africa/Timbuktu',
 'Africa/Tripoli',
 'Africa/Tunis',
 'Africa/Windhoek',
 'America/Adak',
 'America/Anchorage',
 'America/Anguilla',
 'America/Antigua',
 'America/Araguaina',
 'America/Argentina/Buenos_Aires',
 'America/Argentina/Catamarca',
 'America/Argentina/ComodRivadavia',
 'America/Argentina/Cordoba',
 'America/Argentina/Jujuy',
 'America/Argentina/La_Rioja',
 'America/Argentina/Mendoza',
 'America/Argentina/Rio_Gallegos',
 'America/Argentina/Salta',
 'America/Argentina/San_Juan',
 'America/Argentina/San_Luis',
 'America/Argentina/Tucuman',
 'America/Argentina/Ushuaia',
 'America/Aruba',
 'America/Asuncion',
 'America/Atikokan',
 'America/Atka',
 'America/Bahia',
 'America/Bahia_Banderas',
 'America/Barbados',
 'America/Belem',
 'America/Belize',
 'America/Blanc-Sablon',
 'America/Boa_Vista',
 'America/Bogota',
 'America/Boise',
 'America/Buenos_Aires',
 'America/Cambridge_Bay',
 'America/Campo_Grande',
 'America/Cancun',
 'America/Caracas',
 'America/Catamarca',
 'America/Cayenne',
 'America/Cayman',
 'America/Chicago',
 'America/Chihuahua',
 'America/Coral_Harbour',
 'America/Cordoba',
 'America/Costa_Rica',
 'America/Creston',
 'America/Cuiaba',
 'America/Curacao',
 'America/Danmarkshavn',
 'America/Dawson',
 'America/Dawson_Creek',
 'America/Denver',
 'America/Detroit',
 'America/Dominica',
 'America/Edmonton',
 'America/Eirunepe',
 'America/El_Salvador',
 'America/Ensenada',
 'America/Fort_Nelson',
 'America/Fort_Wayne',
 'America/Fortaleza',
 'America/Glace_Bay',
 'America/Godthab',
 'America/Goose_Bay',
 'America/Grand_Turk',
 'America/Grenada',
 'America/Guadeloupe',
 'America/Guatemala',
 'America/Guayaquil',
 'America/Guyana',
 'America/Halifax',
 'America/Havana',
 'America/Hermosillo',
 'America/Indiana/Indianapolis',
 'America/Indiana/Knox',
 'America/Indiana/Marengo',
 'America/Indiana/Petersburg',
 'America/Indiana/Tell_City',
 'America/Indiana/Vevay',
 'America/Indiana/Vincennes',
 'America/Indiana/Winamac',
 'America/Indianapolis',
 'America/Inuvik',
 'America/Iqaluit',
 'America/Jamaica',
 'America/Jujuy',
 'America/Juneau',
 'America/Kentucky/Louisville',
 'America/Kentucky/Monticello',
 'America/Knox_IN',
 'America/Kralendijk',
 'America/La_Paz',
 'America/Lima',
 'America/Los_Angeles',
 'America/Louisville',
 'America/Lower_Princes',
 'America/Maceio',
 'America/Managua',
 'America/Manaus',
 'America/Marigot',
 'America/Martinique',
 'America/Matamoros',
 'America/Mazatlan',
 'America/Mendoza',
 'America/Menominee',
 'America/Merida',
 'America/Metlakatla',
 'America/Mexico_City',
 'America/Miquelon',
 'America/Moncton',
 'America/Monterrey',
 'America/Montevideo',
 'America/Montreal',
 'America/Montserrat',
 'America/Nassau',
 'America/New_York',
 'America/Nipigon',
 'America/Nome',
 'America/Noronha',
 'America/North_Dakota/Beulah',
 'America/North_Dakota/Center',
 'America/North_Dakota/New_Salem',
 'America/Ojinaga',
 'America/Panama',
 'America/Pangnirtung',
 'America/Paramaribo',
 'America/Phoenix',
 'America/Port-au-Prince',
 'America/Port_of_Spain',
 'America/Porto_Acre',
 'America/Porto_Velho',
 'America/Puerto_Rico',
 'America/Punta_Arenas',
 'America/Rainy_River',
 'America/Rankin_Inlet',
 'America/Recife',
 'America/Regina',
 'America/Resolute',
 'America/Rio_Branco',
 'America/Rosario',
 'America/Santa_Isabel',
 'America/Santarem',
 'America/Santiago',
 'America/Santo_Domingo',
 'America/Sao_Paulo',
 'America/Scoresbysund',
 'America/Shiprock',
 'America/Sitka',
 'America/St_Barthelemy',
 'America/St_Johns',
 'America/St_Kitts',
 'America/St_Lucia',
 'America/St_Thomas',
 'America/St_Vincent',
 'America/Swift_Current',
 'America/Tegucigalpa',
 'America/Thule',
 'America/Thunder_Bay',
 'America/Tijuana',
 'America/Toronto',
 'America/Tortola',
 'America/Vancouver',
 'America/Virgin',
 'America/Whitehorse',
 'America/Winnipeg',
 'America/Yakutat',
 'America/Yellowknife',
 'Antarctica/Casey',
 'Antarctica/Davis',
 'Antarctica/DumontDUrville',
 'Antarctica/Macquarie',
 'Antarctica/Mawson',
 'Antarctica/McMurdo',
 'Antarctica/Palmer',
 'Antarctica/Rothera',
 'Antarctica/South_Pole',
 'Antarctica/Syowa',
 'Antarctica/Troll',
 'Antarctica/Vostok',
 'Arctic/Longyearbyen',
 'Asia/Aden',
 'Asia/Almaty',
 'Asia/Amman',
 'Asia/Anadyr',
 'Asia/Aqtau',
 'Asia/Aqtobe',
 'Asia/Ashgabat',
 'Asia/Ashkhabad',
 'Asia/Atyrau',
 'Asia/Baghdad',
 'Asia/Bahrain',
 'Asia/Baku',
 'Asia/Bangkok',
 'Asia/Barnaul',
 'Asia/Beirut',
 'Asia/Bishkek',
 'Asia/Brunei',
 'Asia/Calcutta',
 'Asia/Chita',
 'Asia/Choibalsan',
 'Asia/Chongqing',
 'Asia/Chungking',
 'Asia/Colombo',
 'Asia/Dacca',
 'Asia/Damascus',
 'Asia/Dhaka',
 'Asia/Dili',
 'Asia/Dubai',
 'Asia/Dushanbe',
 'Asia/Famagusta',
 'Asia/Gaza',
 'Asia/Harbin',
 'Asia/Hebron',
 'Asia/Ho_Chi_Minh',
 'Asia/Hong_Kong',
 'Asia/Hovd',
 'Asia/Irkutsk',
 'Asia/Istanbul',
 'Asia/Jakarta',
 'Asia/Jayapura',
 'Asia/Jerusalem',
 'Asia/Kabul',
 'Asia/Kamchatka',
 'Asia/Karachi',
 'Asia/Kashgar',
 'Asia/Kathmandu',
 'Asia/Katmandu',
 'Asia/Khandyga',
 'Asia/Kolkata',
 'Asia/Krasnoyarsk',
 'Asia/Kuala_Lumpur',
 'Asia/Kuching',
 'Asia/Kuwait',
 'Asia/Macao',
 'Asia/Macau',
 'Asia/Magadan',
 'Asia/Makassar',
 'Asia/Manila',
 'Asia/Muscat',
 'Asia/Nicosia',
 'Asia/Novokuznetsk',
 'Asia/Novosibirsk',
 'Asia/Omsk',
 'Asia/Oral',
 'Asia/Phnom_Penh',
 'Asia/Pontianak',
 'Asia/Pyongyang',
 'Asia/Qatar',
 'Asia/Qyzylorda',
 'Asia/Rangoon',
 'Asia/Riyadh',
 'Asia/Saigon',
 'Asia/Sakhalin',
 'Asia/Samarkand',
 'Asia/Seoul',
 'Asia/Shanghai',
 'Asia/Singapore',
 'Asia/Srednekolymsk',
 'Asia/Taipei',
 'Asia/Tashkent',
 'Asia/Tbilisi',
 'Asia/Tehran',
 'Asia/Tel_Aviv',
 'Asia/Thimbu',
 'Asia/Thimphu',
 'Asia/Tokyo',
 'Asia/Tomsk',
 'Asia/Ujung_Pandang',
 'Asia/Ulaanbaatar',
 'Asia/Ulan_Bator',
 'Asia/Urumqi',
 'Asia/Ust-Nera',
 'Asia/Vientiane',
 'Asia/Vladivostok',
 'Asia/Yakutsk',
 'Asia/Yangon',
 'Asia/Yekaterinburg',
 'Asia/Yerevan',
 'Atlantic/Azores',
 'Atlantic/Bermuda',
 'Atlantic/Canary',
 'Atlantic/Cape_Verde',
 'Atlantic/Faeroe',
 'Atlantic/Faroe',
 'Atlantic/Jan_Mayen',
 'Atlantic/Madeira',
 'Atlantic/Reykjavik',
 'Atlantic/South_Georgia',
 'Atlantic/St_Helena',
 'Atlantic/Stanley',
 'Australia/ACT',
 'Australia/Adelaide',
 'Australia/Brisbane',
 'Australia/Broken_Hill',
 'Australia/Canberra',
 'Australia/Currie',
 'Australia/Darwin',
 'Australia/Eucla',
 'Australia/Hobart',
 'Australia/LHI',
 'Australia/Lindeman',
 'Australia/Lord_Howe',
 'Australia/Melbourne',
 'Australia/NSW',
 'Australia/North',
 'Australia/Perth',
 'Australia/Queensland',
 'Australia/South',
 'Australia/Sydney',
 'Australia/Tasmania',
 'Australia/Victoria',
 'Australia/West',
 'Australia/Yancowinna',
 'Brazil/Acre',
 'Brazil/DeNoronha',
 'Brazil/East',
 'Brazil/West',
 'CET',
 'CST6CDT',
 'Canada/Atlantic',
 'Canada/Central',
 'Canada/Eastern',
 'Canada/Mountain',
 'Canada/Newfoundland',
 'Canada/Pacific',
 'Canada/Saskatchewan',
 'Canada/Yukon',
 'Chile/Continental',
 'Chile/EasterIsland',
 'Cuba',
 'EET',
 'EST',
 'EST5EDT',
 'Egypt',
 'Eire',
 'Etc/GMT',
 'Etc/GMT+0']

If it’s midday in Adelaide, what time is it in Buenos Aires?

# define the midday time in Adelaide and 'localize' it to the Adelaide timezone
time_adelaide_midday = pd.Timestamp(year=2019, month=3, day=23, hour=12)
time_adelaide_midday = time_adelaide_midday.tz_localize('Australia/Adelaide')
# convert from one timezone to another
time_adelaide_midday_BA = time_adelaide_midday.tz_convert('America/Argentina/Buenos_Aires')
print(time_adelaide_midday_BA)
2019-03-22 22:30:00-03:00
# Now suppose we have a series in UTC
times = pd.date_range('2019/02/15', '2019/03/17', tz='UTC')

# What are these values in Adelaide time?
times.tz_convert('Australia/Adelaide')
DatetimeIndex(['2019-02-15 10:30:00+10:30', '2019-02-16 10:30:00+10:30',
               '2019-02-17 10:30:00+10:30', '2019-02-18 10:30:00+10:30',
               '2019-02-19 10:30:00+10:30', '2019-02-20 10:30:00+10:30',
               '2019-02-21 10:30:00+10:30', '2019-02-22 10:30:00+10:30',
               '2019-02-23 10:30:00+10:30', '2019-02-24 10:30:00+10:30',
               '2019-02-25 10:30:00+10:30', '2019-02-26 10:30:00+10:30',
               '2019-02-27 10:30:00+10:30', '2019-02-28 10:30:00+10:30',
               '2019-03-01 10:30:00+10:30', '2019-03-02 10:30:00+10:30',
               '2019-03-03 10:30:00+10:30', '2019-03-04 10:30:00+10:30',
               '2019-03-05 10:30:00+10:30', '2019-03-06 10:30:00+10:30',
               '2019-03-07 10:30:00+10:30', '2019-03-08 10:30:00+10:30',
               '2019-03-09 10:30:00+10:30', '2019-03-10 10:30:00+10:30',
               '2019-03-11 10:30:00+10:30', '2019-03-12 10:30:00+10:30',
               '2019-03-13 10:30:00+10:30', '2019-03-14 10:30:00+10:30',
               '2019-03-15 10:30:00+10:30', '2019-03-16 10:30:00+10:30',
               '2019-03-17 10:30:00+10:30'],
              dtype='datetime64[ns, Australia/Adelaide]', freq='D')