Hosted with nbsanity. See source notebook on GitHub.

Population estimates on uniform geometries

4 June 2019

ABS releases population figures at various levels: SA1, SA2,… and also at meshblock scale. This notebook contains code for obtaining estimates of populations on other geometries, in particular uniform hex and square grids.

We do this using the known populations from the ABS 2016 Census at the meshblock level and then perform a weighted average over these with each of the cells in the new geometry. The result is a more granular estimate of the population than at SA2 level. This can be useful when working with data sources that are defined on uniform geometries rather than standard ABS geometries.

References: - https://www.abs.gov.au/ausstats/[email protected]/Lookup/by%20Subject/1270.0.55.001_July%202016Main%20Features_{Mesh%20Blocks%20(MB)}10012

https://www.abs.gov.au/ausstats/[email protected]/mf/2074.0
https://www.abs.gov.au/websitedbs/D3310114.nsf/home/Australian+Statistical+Geography+Standard+(ASGS)

1.0. Setup

import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np

import folium

from shapely.geometry import Polygon, Point, MultiPolygon
from shapely import wkt

%matplotlib inline

1.1. Population transformation function

This function carriers out the estimates of the populations on the new geometry. It takes two shapefiles:

gdf1: the meshblock geometry, population figures and areas of the meshblocks
gdf2: the shapefile for the new geometry

The approach is as follows. For each of the cells in the new geometry: - Calculate the meshblocks that is intersects with and the corresponding areas. - Compute the fraction of overlap there is with each of these meshblocks based on the area. - Estimate the population by taking a weighted sum of the populations of the intersecting meshblocks and the corresponding overlap fractions

def parameter_estimate(gdf1, gdf2, area_field, param_field, label2_field):
    
    # calculate the overlay
    gdf_overlay = gpd.overlay(gdf1, gdf2, how='intersection')
    
    # calculate the area of each of the new polygons
    gdf_overlay['area_new'] = gdf_overlay['geometry'].apply(lambda p: p.area)
    
    # calculate the fraction of the new polygons intersecting with the old ones
    gdf_overlay['fraction_overlap'] = gdf_overlay['area_new'] / gdf_overlay[area_field]
    
    # estimate the value of the parameter for each of the new polygons in their overlaps
    gdf_overlay[param_field + '_new'] = gdf_overlay['fraction_overlap'] * gdf_overlay[param_field]
    
    # calculate the weighted sum to estimate the new value for each of the new polygons
    df_ws = gdf_overlay.groupby(label2_field)[param_field + '_new'].sum().reset_index()
    
    return df_ws

1.2. Define the CRS

crs_abs is the lat, lng coord system
crs_vic is a projected CRS

# define a CRS
crs_abs = {'init': 'epsg:4283'}

# the projected CRS
crs_vic = {'init': 'epsg:28355'}

2.0. Load up datasets

We use following datasets: - Victorian meshblocks which contain the counts of people and dwellings - Victorian meshblock shapefiles containing the geometry - Regular 1km x 1km square shapefile - Regular hexagon shapefile

2.1. Victorian meshblocks with person and dwelling counts

# load up the Vic meshblock codes and corresponding dwelling and person counts
df_mb_vic = pd.read_csv('data/refined/MB_VIC_2016.csv')

# let's have a look at this
df_mb_vic.head(5)

	MB_CODE_2016	MB_CATEGORY_NAME_2016_y	SA1_MAINCODE_2016	SA2_NAME_2016	SA3_NAME_2016	SA4_NAME_2016	GCCSA_NAME_2016	Dwelling	Person
0	20000009499	NOUSUALRESIDENCE	29999949999	No usual address (Vic.)	No usual address (Vic.)	No usual address (Vic.)	No usual address (Vic.)	NaN	7565.0
1	20000010000	Education	20403106914	Bright - Mount Beauty	Wodonga - Alpine	Hume	Rest of Vic.	0.0	0.0
2	20000021000	Commercial	20403106902	Bright - Mount Beauty	Wodonga - Alpine	Hume	Rest of Vic.	3.0	0.0
3	20000022000	Commercial	20403106902	Bright - Mount Beauty	Wodonga - Alpine	Hume	Rest of Vic.	3.0	3.0
4	20000023000	Commercial	20403106902	Bright - Mount Beauty	Wodonga - Alpine	Hume	Rest of Vic.	5.0	5.0

2.2. Regular 1km x 1km square grid shapefile

gdf_1km_sqr = gpd.read_file('data/refined/sqrgrid_aoi_1km.shp')
gdf_1km_sqr.head(5)

	SquareLabe	geometry
0	5777	POLYGON ((143.6154027552672 -37.98629078606395...
1	6069	POLYGON ((143.6247063477386 -38.03160383838987...
2	6070	POLYGON ((143.6251195822346 -38.02260665306308...
3	6071	POLYGON ((143.6255326327272 -38.01360945011705...
4	6074	POLYGON ((143.6267706810596 -37.98661773559558...

2.3. Regular hexagaon shapefile (1km side lengths)

gdf_1km_hex = gpd.read_file('data/refined/hexgrid_aoi_1km.shp')
gdf_1km_hex.head(5)

	HexLabel	geometry
0	13733	POLYGON ((144.3291385892137 -37.62172602425927...
1	14027	POLYGON ((144.3477842656918 -37.64917983509825...
2	14028	POLYGON ((144.3384597639403 -37.63545324745296...
3	14029	POLYGON ((144.3487448672874 -37.62216848943908...
4	14030	POLYGON ((144.3394232637044 -37.60844204331411...

2.4. Victorian meshblocks shapefile

# load up the meshblocks shapefiles
gdf_mb = gpd.read_file('data/opendata/Census/1270055001_mb_2016_vic_shape/MB_2016_VIC.shp')

# rename the column to be consistent with the other data sources
gdf_mb = gdf_mb.rename(columns={'MB_CODE16': 'MB_CODE_2016'})
gdf_mb = gdf_mb[['MB_CODE_2016', 'geometry']]

# convert the MB_CODE_2016 to a integer
gdf_mb['MB_CODE_2016'] = gdf_mb['MB_CODE_2016'].astype(np.int)

gdf_mb.head(5)

	MB_CODE_2016	geometry
0	20000009499	None
1	20000010000	POLYGON ((147.1425276460001 -36.69220601699993...
2	20000021000	POLYGON ((146.9593400960001 -36.72780516099994...
3	20000022000	POLYGON ((146.9613617580001 -36.72686980099996...
4	20000023000	POLYGON ((146.9610484790001 -36.72822697899994...

3.0. Wrangling

3.1. Filter to SA2 (Suburb)

# look at just one suburb
#SA2_NAME_2016 = 'Carlton'
#SA3_NAME_2016 = 'Melbourne City'
#df_smaller = df_mb_vic[df_mb_vic['SA3_NAME_2016']==SA3_NAME_2016]
#df_smaller.head(5)

3.2. Filter to Greater Melbourne area

Select ‘Greater Melbourne’ as the GCCSA_NAME_2016 value and the meshblocks in this region.

GCCSA_NAME_2016 = 'Greater Melbourne'
df_aoi = df_mb_vic[df_mb_vic['GCCSA_NAME_2016']==GCCSA_NAME_2016]
df_aoi.head(5)

	MB_CODE_2016	MB_CATEGORY_NAME_2016_y	SA1_MAINCODE_2016	SA2_NAME_2016	SA3_NAME_2016	SA4_NAME_2016	GCCSA_NAME_2016	Dwelling	Person
1599	20015400000	Residential	20901119918	Heidelberg West	Banyule	Melbourne - North East	Greater Melbourne	33.0	53.0
1600	20015410000	Residential	20901120346	Viewbank - Yallambie	Banyule	Melbourne - North East	Greater Melbourne	32.0	87.0
1601	20015420000	Residential	20901120316	Viewbank - Yallambie	Banyule	Melbourne - North East	Greater Melbourne	63.0	178.0
1602	20015430000	Residential	20901120318	Viewbank - Yallambie	Banyule	Melbourne - North East	Greater Melbourne	47.0	115.0
1603	20015440000	Residential	20901120012	Ivanhoe	Banyule	Melbourne - North East	Greater Melbourne	33.0	68.0

Check that the population we’re getting is reasonable

df_aoi['Person'].sum()

4484394.0

3.3. Join the meshblock shapefiles with the mesh block population data

This is done for just our Area of Interest

# join the shapefiles with the Vic. meshblock data
gdf_mb_sa2 = df_aoi.merge(gdf_mb, on='MB_CODE_2016')

nice_cols = ['MB_CODE_2016', 'MB_CATEGORY_NAME_2016_y', 'SA1_MAINCODE_2016', 'SA2_NAME_2016', 'SA3_NAME_2016',
            'Dwelling', 'Person']

meshblocks_sa2 = gpd.GeoDataFrame(data=gdf_mb_sa2[nice_cols], geometry=gdf_mb_sa2['geometry'])
meshblocks_sa2.crs = crs_abs

3.4. Intersect with square grid covering the AOI

gdf_sa2 = gpd.sjoin(gdf_1km_sqr, meshblocks_sa2, op='intersects')
sqr_sa2 = gdf_sa2.dissolve(by='SquareLabe', aggfunc='first')

3.5. Intersect with hex grid covering the AOI

gdf_sa2 = gpd.sjoin(gdf_1km_hex, meshblocks_sa2, op='intersects')
hex_sa2 = gdf_sa2.dissolve(by='HexLabel', aggfunc='first')

4.0. Population estimates

4.1. Change CRS

Because I don’t know how to do area calculations correctly in the lat, lng CRS

meshblocks_sa2_new = meshblocks_sa2.to_crs(crs_vic)
sqr_sa2_new = sqr_sa2.to_crs(crs_vic) # convert the squares
hex_sa2_new = hex_sa2.to_crs(crs_vic) # convert the hexagons

meshblocks_sa2_new.head(5)

	MB_CODE_2016	MB_CATEGORY_NAME_2016_y	SA1_MAINCODE_2016	SA2_NAME_2016	SA3_NAME_2016	Dwelling	Person	geometry
0	20015400000	Residential	20901119918	Heidelberg West	Banyule	33.0	53.0	POLYGON ((328410.1439992741 5819338.914122874,...
1	20015410000	Residential	20901120346	Viewbank - Yallambie	Banyule	32.0	87.0	POLYGON ((332284.1471433818 5821512.586353699,...
2	20015420000	Residential	20901120316	Viewbank - Yallambie	Banyule	63.0	178.0	POLYGON ((332245.6594393802 5822051.032302149,...
3	20015430000	Residential	20901120318	Viewbank - Yallambie	Banyule	47.0	115.0	POLYGON ((332663.5247888024 5822260.237469032,...
4	20015440000	Residential	20901120012	Ivanhoe	Banyule	33.0	68.0	POLYGON ((326746.5876069277 5818877.766105011,...

4.2. Filter data

We only need a few of the columns from the above datasets.

squares = sqr_sa2_new[['MB_CODE_2016', 'geometry']].reset_index()
squares.head(5)

	SquareLabe	MB_CODE_2016	geometry
0	23934	20631929990	POLYGON ((263743.8136395717 5830298.752182018,...
1	24231	20631929990	POLYGON ((264743.8136395747 5830298.752182018,...
2	24232	20631929990	POLYGON ((264743.8136395735 5831298.752182018,...
3	24233	20631929990	POLYGON ((264743.8136395724 5832298.752182018,...
4	24525	20631929990	POLYGON ((265743.8136395731 5827298.752182018,...

hexes = hex_sa2_new[['MB_CODE_2016', 'geometry']].reset_index()
hexes.head(5)

	HexLabel	MB_CODE_2016	geometry
0	13733	20631929990	POLYGON ((264284.1761915246 5832798.752182018,...
1	14027	20631929990	POLYGON ((266016.2269990934 5829798.752182018,...
2	14028	20631929990	POLYGON ((265150.201595313 5831298.752182018, ...
3	14029	20631929990	POLYGON ((266016.226999094 5832798.752182018, ...
4	14030	20631929990	POLYGON ((265150.2015953095 5834298.752182016,...

From the meshblock data we need the meshblock code, meshblock polygons and the population within each meshblock

mesh = meshblocks_sa2_new[['MB_CODE_2016', 'Person', 'geometry']]

# calculate the area of each of the meshblocks in the projected CRS
mesh['area'] = mesh['geometry'].apply(lambda p: p.area)

/Users/vic2e3a/.conda/envs/uberenv/lib/python3.6/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.

mesh.head(5)

	MB_CODE_2016	Person	geometry	area
0	20015400000	53.0	POLYGON ((328410.1439992741 5819338.914122874,...	12500.606939
1	20015410000	87.0	POLYGON ((332284.1471433818 5821512.586353699,...	27133.446933
2	20015420000	178.0	POLYGON ((332245.6594393802 5822051.032302149,...	44286.746329
3	20015430000	115.0	POLYGON ((332663.5247888024 5822260.237469032,...	45616.607446
4	20015440000	68.0	POLYGON ((326746.5876069277 5818877.766105011,...	24518.744626

4.3. Calculate the population estimates

Given the meshblock grid on which we have population data we want to estimate how many people are in each of our regular grid cells.

square_data = np.round(parameter_estimate(mesh, squares, 'area', 'Person', 'SquareLabe'))

square_data.head(5)

	SquareLabe	Person_new
0	23934	0.0
1	24231	4.0
2	24232	4.0
3	24233	2.0
4	24525	1.0

hex_data = np.round(parameter_estimate(mesh, hexes, 'area', 'Person', 'HexLabel'))

hex_data.head(5)

	HexLabel	Person_new
0	13733	1.0
1	14027	5.0
2	14028	3.0
3	14029	10.0
4	14030	2.0

4.4. Output the new estimates

Note that there is a population estimate associated with each of the polygons. These are labelled by ‘SquareLabe’ (for the square geometry) and ‘HexLabel’ for the hexagon geometry.

square_data.to_csv('data/refined/population_sqr_1km.csv', index=False)

hex_data.to_csv('data/refined/population_hex_1km.csv', index=False)

square_data.head(5)

	SquareLabe	Person_new
0	23934	0.0
1	24231	4.0
2	24232	4.0
3	24233	2.0
4	24525	1.0

hex_data.head(5)

	HexLabel	Person_new
0	13733	1.0
1	14027	5.0
2	14028	3.0
3	14029	10.0
4	14030	2.0

square_data['Person_new'].sum()

4484488.0

hex_data['Person_new'].sum()

4484421.0