The Commonwealth Scientific and Industrial Research Organisation (CSIRO) has developed a new data privacy tool to help ensure key datasets — such as those tracking COVID-19 — can be publicly shared with an extra layer of security for sensitive personal information.
CSIRO’s Data61, the digital specialist arm of the Agency, along with the NSW Government, the Australian Computer Society (ACS) and several other groups contributed to the tool.
It assesses the risks to an individual’s data within any dataset; allowing targeted and effective protection mechanisms to be put in place.
Chief Data Scientist at the NSW Government, Ian Oppermann said the Personal Information Factor (PIF) tool used a sophisticated data analytics algorithm to identify the risks that sensitive, de-identified and personal information within a dataset could be re-identified and matched to its owner.
“The early version of the tool is already being used by the NSW Government to analyse datasets tracking the spread of COVID-19 in the State and apply appropriate levels of protection before the data is released as open data,” Dr Oppermann said.
“There’s no other piece of software like the PIF tool.”
He said it was developed “through a long and very collaborative process involving many State, Commonwealth and industry colleagues. CSIRO’s Data61 really brought it to life and made it useable”.
Project lead researcher and Senior Research Scientist at Data61, Sushmita Ruj said new methods of data de-identification could provide enhanced levels of data privacy and ensure data involving personal information was protected.
“Having studied other privacy metrics, the team concluded a one-size-fits-all approach to estimating the re-identification risks of unique applications of data can be significantly improved upon,” Dr Ruj said.
“The evolving approach to a PIF takes a tailored approach to each dataset by considering various attack scenarios used to de-identify information. The tool then assigns a PIF score to each set.”
He said if the PIF was higher than a desired threshold, the program made recommendations on how to design a more secure and safe framework to certify the dataset was safe to be publicly released.