Macquarie University researchers found a way to identify individuals and their census responses, Which-50 has learned. The vulnerability means a large-scale attack could be used to reconstruct the entire census database. The ABS claims that the flaw has subsequently been addressed, but it’s not entirely clear that the measures taken are enough.
Data collected in the 2016 census was vulnerable to hackers through the exploitation of the Australian Bureau of Statistics data visualisation tool, according to a new study from Macquarie University.
The exploit made it possible for attackers to identify individuals and their census responses, including sensitive personal information, when it was combined with a minimal amount of background information, according to the researchers.
The ABS confirmed the finding but said it has addressed the issue after the researchers flagged it last year, along with a mitigation strategy. The ABS said it is not aware of any person’s privacy being compromised.
However, the ABS remedy — restricting some data access and limiting use of the data visualisation tool — would not necessarily prevent more sophisticated attacks in the future, according to the researchers.
A Macquarie University research team, led by Dr Dali Kaafar, identified a vulnerability in the perturbation algorithm used by the ABS for its TableBuilder tool.
TableBuilder is a publicly available online tool provided by the ABS for users to create tables, graphs and maps of census data. When a user inputs a query, results are returned based on real census data but also include a level of “noise”, deliberately introduced through a perturbation algorithm, so the data is not completely accurate and does not identify respondents.
But researchers demonstrated how an attacker could remove the “noise” by conducting multiple queries to determine the algorithm’s parameters and separate noise, eventually revealing accurate census data and making it possible to identify respondents.
Once an accurate census response has been found, only “a little background information is enough to link them to real persons in the population”.
“The implications here are now suddenly any attacker can craft queries in a specific way to reconstruct the original census data that is being protected by this algorithm,” said research author and Professor of Privacy Preserving Technologies at Macquarie University, Dr Dali Kaafar.
“Removing this protection has been shown by this research to be not only feasible but also actually quite easy,” said Dr Kaafar.
Dr Kaafar, who is also the chief scientist of the Optus Macquarie University cybersecurity hub, told Which-50 just 200 queries — a relatively low bar because queries are easily automated — would return a probability of above 95 per cent that data is exactly accurate.
Having the accurate starting point then allows attackers to add more and more data, eventually identifying individuals and their census responses.
“The more you know the easier it becomes to add more knowledge … Basically you can reconstruct every little thing about that individual, for example,” Kaafar explained.
For ethical reasons the Macquarie University researchers conducted tests on synthetic data sets with a TableBuilder API, but say the practice was applicable to census data through the publicly available web interface tool.
An ABS spokesperson told Which-50 they had been working with Dr Kaafar on a fix since early 2017.
However, the ABS response may not be enough to prevent more sophisticated attacks in the future, Kaafar said.
“In response to the identified vulnerability, ABS has tried to bring some upcoming changes to the TableBuilder tool … We believe that a more controlled access to TableBuilder is definitely a step in the right direction.
“However, It is not clear exactly how safe these measures would be against a non-naïve attacker who would operate in a stealthy way. Besides, it is extremely important to be able to quantify the residual risks taken, and these fixes are somehow just empirical in nature and not provably robust against our attack or other potential ones. In essence, no one really knowns whether these would be secure enough.
“Our recommendation is to adopt provable privacy approaches, rather than probably private ones.”
While most people will rightly be concerned about the risk of identification, there are other potentially more harmful outcomes, according to Dr Kaafar.
“The bigger issue here is this is not only a re-identification risk, from a privacy perspective. This is actually what privacy researchers call a reconstruction attack, where you reconstruct the whole database.”
It means attackers with enough resources, likely at the state actor level, could use the method to reconstruct the entire census database, giving them access to details of the Australian people. The implication then, Kafaar says, is the chance of more sophisticated cyber attacks including phishing, social engineering attacks and malware spreading.
“When you know details about people you can actually lure them or fool them in so many different ways … Implications go beyond the simple information leakage perspective.”