After enduring the hottest Australian summer on record in 2019/20 sheltered inside with air-con and air purifier running as large parts of my home state of New South Wales burned, I decided to run some exploratory analysis on the Bureau of Meteorology’s fantastic ACORN-SAT 2.1 database.

The Australian Climate Observations Reference Network – Surface Air Temperature (ACORN-SAT) is the dataset used by the Bureau of Meteorology to monitor long-term temperature trends in Australia. ACORN-SAT uses observations from 112 weather stations in all corners of Australia, selected for the quality and length of their available temperature data.

Setup

Let’s get started by importing the required packages

import pandas as pd
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import chart_studio.plotly as py
import plotly.graph_objects as go
init_notebook_mode(connected=False)
import plotly.express as px
import numpy as np
import seaborn as sns
import string
import glob
import plotly.io as pio

Some Plotly visualisations require a Mapbox account - you can register for the basic account tier for free.

#You will need your own token
token = open(".mapbox_token").read() 
places = pd.read_csv("acorn_sat_v2.1.0_stations.csv")
places.head()
stn_num stn_name lat lon elevation
0 1019 Kalumburu -14.30 126.65 23
1 2079 Halls Creek -18.23 127.67 409
2 3003 Broome -17.95 122.24 7
3 4032 Port Hedland -20.37 118.63 6
4 4106 Marble Bar -21.18 119.75 182

We then create a dataframe from a CSV containing a list of weather station names and locations

The dataset of temperature readings comes split into ‘max’ and ‘min’ folders containing a CSV for each weather station. The below code loops through each CSV, creating a dataframe for each station and running some pre-processing, before concatenating them together.

#Loop through the weather station CSVs, run some preprocessing and add the dataframes to a temporary list
path = "max"
li = []
 
for count, f in enumerate(glob.glob(path + "/*.csv")):
    df = pd.read_csv(f, index_col=None, header=0)
    df['site number'] = df['site number'][0]
    df['site name'] = string.capwords(df['site name'][0])
    df['lat'] = places['lat'][count]
    df['lon'] = places['lon'][count]
    df = df.drop(df.index[0])
    df['year'] = pd.to_datetime(df['date']).dt.year
    df['long term average'] = df['maximum temperature (degC)'][(df['year']<1980) & (df['year']>1949)].mean()
    li.append(df)

#Concatenate list of dataframes into one large dataframe of nearly 4 million rows. 
df = pd.concat(li, axis=0, ignore_index=True)
#Do likewise with the weather station data from the 'min' dataset
path = "min"
li = []

for count, f in enumerate(glob.glob(path + "/*.csv")):
    df_min = pd.read_csv(f, index_col=None, header=0)
    df_min = df_min.drop(df_min.index[0])
    li.append(df_min)

df_min = pd.concat(li, axis=0, ignore_index=True)
#Add the minimum temperature data to the first dataframe
df['min'] = df_min['minimum temperature (degC)']
df.rename(columns = {"maximum temperature (degC)": "max"}, inplace=True)

df['date'] = pd.to_datetime(df['date'])
df['year'] = pd.to_datetime(df['date']).dt.year

Exploring the Australian Bureau of Meteorology data

We start by printing some basic statistics describing the dataset

df.head()
date max site number site name lat lon year long term average min
0 1941-09-01 31.0 1019.0 Kalumburu -14.3 126.65 1941 33.678851 20.5
1 1941-09-02 31.0 1019.0 Kalumburu -14.3 126.65 1941 33.678851 20.5
2 1941-09-03 30.5 1019.0 Kalumburu -14.3 126.65 1941 33.678851 19.3
3 1941-09-04 38.8 1019.0 Kalumburu -14.3 126.65 1941 33.678851 21.0
4 1941-09-05 32.1 1019.0 Kalumburu -14.3 126.65 1941 33.678851 19.6
df.dtypes
date                 datetime64[ns]
max                         float64
site number                 float64
site name                    object
lat                         float64
lon                         float64
year                          int64
long term average           float64
min                         float64
dtype: object
df.describe(include=[np.object])
site name
count 3882887
unique 112
top Cairns Aero
freq 40177

There are a not insignificant number of missing values in this dataset, however given we are exploring mean temperature values this is not especially critical.

pd.isnull(df['max']).value_counts()
False    3779214
True      103673
Name: max, dtype: int64

Visualising Extreme Heat Events

These are the days when you just can’t afford for your air-con to break. When koalas come down from their gum trees for a swim in the billabong. Using the BoM dataset let’s find out where in Australia you would be better off living underground, and whether the occurences of 40°C+ (104 °F) are on the rise.

dfheat = df.copy()

#Tag days where the temperature reading exceeded 40°C (104 °F)
dfheat['Average Days of 40C+ / Year'] = np.where(dfheat['max'] >= 40, 1, 0)

dfheat.head()
date max site number site name lat lon year long term average min Average Days of 40C+ / Year
0 1941-09-01 31.0 1019.0 Kalumburu -14.3 126.65 1941 33.678851 20.5 0
1 1941-09-02 31.0 1019.0 Kalumburu -14.3 126.65 1941 33.678851 20.5 0
2 1941-09-03 30.5 1019.0 Kalumburu -14.3 126.65 1941 33.678851 19.3 0
3 1941-09-04 38.8 1019.0 Kalumburu -14.3 126.65 1941 33.678851 21.0 0
4 1941-09-05 32.1 1019.0 Kalumburu -14.3 126.65 1941 33.678851 19.6 0

Hottest Places in Australia (2019)

Let’s start by finding the Australian locations with the most days of extreme heat (40°C / 104 °F) in 2019

#Slice dataframe and select the top 10 locations
df_hottest = dfheat[dfheat['year'] == 2019]
df_hottest = df_hottest.groupby(["site name"]).sum()
df_hottest = df_hottest.sort_values('Average Days of 40C+ / Year', ascending=False)
df_hottest = df_hottest.reset_index().loc[:9, :]

df_hottest
site name max site number lat lon year long term average min Average Days of 40C+ / Year
0 Marble Bar 13335.4 1498690.0 -7730.70 43708.75 736935 12923.318800 7473.3 154
1 Rabbit Flat 13120.6 5718090.0 -7365.70 47453.65 736935 12218.645090 5216.3 139
2 Karijini North 12920.3 1860770.0 -8139.50 43234.25 736935 11679.412979 7623.9 127
3 Tennant Creek Airport 12160.2 5524275.0 -7168.60 48975.70 736935 11404.563166 7292.2 92
4 Victoria River Downs 12802.4 5411125.0 -5986.00 47818.65 736935 12676.517644 7079.4 89
5 Camooweal Township 12520.6 13508650.0 -7270.80 50413.80 736935 11865.371179 6600.0 88
6 Halls Creek Airport 12645.8 758835.0 -6653.95 46599.55 736935 11876.945578 6926.7 77
7 Oodnadatta Airport 11400.5 6220695.0 -10059.40 49439.25 736935 10442.994556 5205.0 74
8 Birdsville Airport 11622.4 13879490.0 -9453.50 50862.75 736935 11004.352290 5546.8 74
9 Learmonth Airport 12208.1 1827555.0 -8117.60 41646.50 736935 11368.705666 6463.8 74

The fact Marble Bar (located in Western Australia) is at the top of this list shouldn’t be a surprise. Marble Bar has a hot desert climate with sweltering summers and warm winters. Most of the annual rainfall occurs in the summer. The town set a world record of most consecutive days of 100 °F (37.8 °C) or above, during a period of 160 days from 31 October 1923 to 7 April 1924.

#Create barchart visualisation using Plotly

colors = ['lightsalmon',] * 10
colors[0] = 'indianred'

fig = go.Figure([go.Bar(x=df_hottest['site name'], y=df_hottest["Average Days of 40C+ / Year"], marker_color=colors)])

fig.update_yaxes(title="Count")
fig.update_layout(title_text= "Australia's Hottest Towns: Most Frequent Days Over 40°C (104 °F)", height=450, width=800)


fig.show()