The CSIRO’s digital arm Data61 today provided an update on its data privacy tool that uses AI to help share important datasets publicly in a safer way.
Still under development, it is already being used by the NSW government to analyse COVID-19 spread information, domestic violence data collected during the COVID-19 lockdown, and public transport usage for potential sharing but won’t be made available to the wider public until 2022.
Known as Personal Information Factor (PIF) tool, the software uses an algorithm to identify the risks that sensitive, de-identified, and personal information within a dataset can be re-identified and matched to its owner.
PIF generates a “score” for each dataset which is then used to inform if data can be safely shared and what protection mechanisms should be applied for sharing data.
PIF is the result of a multi-year project between Data61, state governments, the Australian Computer Society (ACS), and several other groups.
The ACS first proposed a privacy-enhancing tool for data sharing in 2018. Which-50 revealed the PIF project in 2019, including concerns that an automated approach could lead to a box-ticking compliance culture for data sharing.
- Read more: Cover Story: In The Rush To Lead The World, Australia’s Data Sharing Frameworks Are Creating More Risks
In 2020 the CSIRO’s Data61 joined the project along with the Cyber Security Cooperative Research Centre.
Unprecedented privacy tool
Today the tool was presented as an evolving way for privacy experts – who traditionally undertake privacy risk assessment – to use computer models to validate this work.
“There’s no other piece of software like the PIF tool,” said Dr Ian Oppermann, the NSW Government’s Chief Data Scientist who has led the development of PIF.
“Every day, it helps us analyse the security and privacy risks of releasing de-identified datasets of people infected with COVID-19 in NSW and the testing cases for COVID-19, allowing us to minimise the re-identification risk before releasing to the public.”
The CSCRC’s Research Director, Professor Helge Janicke, said PIF offered a new “scale on which you can understand the risk” of sharing information.
“Data analysis is well understood but how good the output is once shared is very difficult to understand.
“Hence, the metrics-based approach and analysis that underpins PIF is hugely valuable in achieving the ethical and responsible sharing of critical data, with this technology allowing data owners to fully assess the risks and residual impacts associated with data sharing.”