After enduring the hottest Australian summer on record in 2019/20 sheltered inside with air-con and air purifier running as large parts of my home state of New South Wales burned, I decided to run some exploratory analysis on the Bureau of Meteorology’s fantastic ACORN-SAT 2.1 database.
The Australian Climate Observations Reference Network – Surface Air Temperature (ACORN-SAT) is the dataset used by the Bureau of Meteorology to monitor long-term temperature trends in Australia. ACORN-SAT uses observations from 112 weather stations in all corners of Australia, selected for the quality and length of their available temperature data.
Setup
Let’s get started by importing the required packages
import pandas as pd
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import chart_studio.plotly as py
import plotly.graph_objects as go
init_notebook_mode(connected=False)
import plotly.express as px
import numpy as np
import seaborn as sns
import string
import glob
import plotly.io as pio
Some Plotly visualisations require a Mapbox account - you can register for the basic account tier for free.
#You will need your own token
token = open(".mapbox_token").read()
places = pd.read_csv("acorn_sat_v2.1.0_stations.csv")
places.head()
stn_num | stn_name | lat | lon | elevation | |
---|---|---|---|---|---|
0 | 1019 | Kalumburu | -14.30 | 126.65 | 23 |
1 | 2079 | Halls Creek | -18.23 | 127.67 | 409 |
2 | 3003 | Broome | -17.95 | 122.24 | 7 |
3 | 4032 | Port Hedland | -20.37 | 118.63 | 6 |
4 | 4106 | Marble Bar | -21.18 | 119.75 | 182 |
We then create a dataframe from a CSV containing a list of weather station names and locations
The dataset of temperature readings comes split into ‘max’ and ‘min’ folders containing a CSV for each weather station. The below code loops through each CSV, creating a dataframe for each station and running some pre-processing, before concatenating them together.
#Loop through the weather station CSVs, run some preprocessing and add the dataframes to a temporary list
path = "max"
li = []
for count, f in enumerate(glob.glob(path + "/*.csv")):
df = pd.read_csv(f, index_col=None, header=0)
df['site number'] = df['site number'][0]
df['site name'] = string.capwords(df['site name'][0])
df['lat'] = places['lat'][count]
df['lon'] = places['lon'][count]
df = df.drop(df.index[0])
df['year'] = pd.to_datetime(df['date']).dt.year
df['long term average'] = df['maximum temperature (degC)'][(df['year']<1980) & (df['year']>1949)].mean()
li.append(df)
#Concatenate list of dataframes into one large dataframe of nearly 4 million rows.
df = pd.concat(li, axis=0, ignore_index=True)
#Do likewise with the weather station data from the 'min' dataset
path = "min"
li = []
for count, f in enumerate(glob.glob(path + "/*.csv")):
df_min = pd.read_csv(f, index_col=None, header=0)
df_min = df_min.drop(df_min.index[0])
li.append(df_min)
df_min = pd.concat(li, axis=0, ignore_index=True)
#Add the minimum temperature data to the first dataframe
df['min'] = df_min['minimum temperature (degC)']
df.rename(columns = {"maximum temperature (degC)": "max"}, inplace=True)
df['date'] = pd.to_datetime(df['date'])
df['year'] = pd.to_datetime(df['date']).dt.year
Exploring the Australian Bureau of Meteorology data
We start by printing some basic statistics describing the dataset
df.head()
date | max | site number | site name | lat | lon | year | long term average | min | |
---|---|---|---|---|---|---|---|---|---|
0 | 1941-09-01 | 31.0 | 1019.0 | Kalumburu | -14.3 | 126.65 | 1941 | 33.678851 | 20.5 |
1 | 1941-09-02 | 31.0 | 1019.0 | Kalumburu | -14.3 | 126.65 | 1941 | 33.678851 | 20.5 |
2 | 1941-09-03 | 30.5 | 1019.0 | Kalumburu | -14.3 | 126.65 | 1941 | 33.678851 | 19.3 |
3 | 1941-09-04 | 38.8 | 1019.0 | Kalumburu | -14.3 | 126.65 | 1941 | 33.678851 | 21.0 |
4 | 1941-09-05 | 32.1 | 1019.0 | Kalumburu | -14.3 | 126.65 | 1941 | 33.678851 | 19.6 |
df.dtypes
date datetime64[ns]
max float64
site number float64
site name object
lat float64
lon float64
year int64
long term average float64
min float64
dtype: object
df.describe(include=[np.object])
site name | |
---|---|
count | 3882887 |
unique | 112 |
top | Cairns Aero |
freq | 40177 |
There are a not insignificant number of missing values in this dataset, however given we are exploring mean temperature values this is not especially critical.
pd.isnull(df['max']).value_counts()
False 3779214
True 103673
Name: max, dtype: int64
Visualising Extreme Heat Events
These are the days when you just can’t afford for your air-con to break. When koalas come down from their gum trees for a swim in the billabong. Using the BoM dataset let’s find out where in Australia you would be better off living underground, and whether the occurences of 40°C+ (104 °F) are on the rise.
dfheat = df.copy()
#Tag days where the temperature reading exceeded 40°C (104 °F)
dfheat['Average Days of 40C+ / Year'] = np.where(dfheat['max'] >= 40, 1, 0)
dfheat.head()
date | max | site number | site name | lat | lon | year | long term average | min | Average Days of 40C+ / Year | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1941-09-01 | 31.0 | 1019.0 | Kalumburu | -14.3 | 126.65 | 1941 | 33.678851 | 20.5 | 0 |
1 | 1941-09-02 | 31.0 | 1019.0 | Kalumburu | -14.3 | 126.65 | 1941 | 33.678851 | 20.5 | 0 |
2 | 1941-09-03 | 30.5 | 1019.0 | Kalumburu | -14.3 | 126.65 | 1941 | 33.678851 | 19.3 | 0 |
3 | 1941-09-04 | 38.8 | 1019.0 | Kalumburu | -14.3 | 126.65 | 1941 | 33.678851 | 21.0 | 0 |
4 | 1941-09-05 | 32.1 | 1019.0 | Kalumburu | -14.3 | 126.65 | 1941 | 33.678851 | 19.6 | 0 |
Hottest Places in Australia (2019)
Let’s start by finding the Australian locations with the most days of extreme heat (40°C / 104 °F) in 2019
#Slice dataframe and select the top 10 locations
df_hottest = dfheat[dfheat['year'] == 2019]
df_hottest = df_hottest.groupby(["site name"]).sum()
df_hottest = df_hottest.sort_values('Average Days of 40C+ / Year', ascending=False)
df_hottest = df_hottest.reset_index().loc[:9, :]
df_hottest
site name | max | site number | lat | lon | year | long term average | min | Average Days of 40C+ / Year | |
---|---|---|---|---|---|---|---|---|---|
0 | Marble Bar | 13335.4 | 1498690.0 | -7730.70 | 43708.75 | 736935 | 12923.318800 | 7473.3 | 154 |
1 | Rabbit Flat | 13120.6 | 5718090.0 | -7365.70 | 47453.65 | 736935 | 12218.645090 | 5216.3 | 139 |
2 | Karijini North | 12920.3 | 1860770.0 | -8139.50 | 43234.25 | 736935 | 11679.412979 | 7623.9 | 127 |
3 | Tennant Creek Airport | 12160.2 | 5524275.0 | -7168.60 | 48975.70 | 736935 | 11404.563166 | 7292.2 | 92 |
4 | Victoria River Downs | 12802.4 | 5411125.0 | -5986.00 | 47818.65 | 736935 | 12676.517644 | 7079.4 | 89 |
5 | Camooweal Township | 12520.6 | 13508650.0 | -7270.80 | 50413.80 | 736935 | 11865.371179 | 6600.0 | 88 |
6 | Halls Creek Airport | 12645.8 | 758835.0 | -6653.95 | 46599.55 | 736935 | 11876.945578 | 6926.7 | 77 |
7 | Oodnadatta Airport | 11400.5 | 6220695.0 | -10059.40 | 49439.25 | 736935 | 10442.994556 | 5205.0 | 74 |
8 | Birdsville Airport | 11622.4 | 13879490.0 | -9453.50 | 50862.75 | 736935 | 11004.352290 | 5546.8 | 74 |
9 | Learmonth Airport | 12208.1 | 1827555.0 | -8117.60 | 41646.50 | 736935 | 11368.705666 | 6463.8 | 74 |
The fact Marble Bar (located in Western Australia) is at the top of this list shouldn’t be a surprise. Marble Bar has a hot desert climate with sweltering summers and warm winters. Most of the annual rainfall occurs in the summer. The town set a world record of most consecutive days of 100 °F (37.8 °C) or above, during a period of 160 days from 31 October 1923 to 7 April 1924.
#Create barchart visualisation using Plotly
colors = ['lightsalmon',] * 10
colors[0] = 'indianred'
fig = go.Figure([go.Bar(x=df_hottest['site name'], y=df_hottest["Average Days of 40C+ / Year"], marker_color=colors)])
fig.update_yaxes(title="Count")
fig.update_layout(title_text= "Australia's Hottest Towns: Most Frequent Days Over 40°C (104 °F)", height=450, width=800)
fig.show()