The Australian Government says the national value of open data could be as high as $25 billion per year. Globally, McKinsey puts the value at $3 trillion. It’s little wonder stakeholders are clamouring for their share, attempting to devise systems which enable the transfer and reuse of that data.
However, data sharing is fraught with a risk that varies depending on use cases, content and the interpretation of privacy. That much has been underscored by a series of private sector scandals and failed government data initiatives which, without the necessary checks and balances, have eroded public confidence in institutions’ ability to safely share information whilst maintaining users privacy.
But a cohort of Australian public and private sector stakeholders led by the NSW State Government and the Australian Computer Society has developed an initial a framework for the safe sharing of data among government agencies, researchers and industry in controlled environments, with the expectation it could be developed for broader use cases.
The work is a world first, according to the group which released a whitepaper late last year outlining a data sharing framework that will “optimise privacy and enhance public and consumer trust in order to unleash the untapped value of data for the Australian economy”.
However, critics have told Which-50 the latest framework approach in Australia lacks appropriate risk management and the necessary technical specifications, and could create a dangerous culture of box ticking compliance.
The data sharing challenge
The framework posited by the ACS group attempts to outline a way to share de-identified data within a controlled environment in a way which preserves people’s privacy. It is a more problematic concept than it may first appear.
Datasets containing de-identified data are relatively safe in terms of respecting individuals privacy. But linking the set to one or more others can eventually identify data through what is known as a “mosaic” or “linkage” affect. For example, a dataset containing health information, although de-identified, could be cross referenced with other datasets, eventually matching key data points and revealing further sensitive information.
There’s even a possibility of revealing the location of individuals and putting them at risk of harm.
One example put to Which-50 was a domestic violence situation where matching data sets could reveal a partner’s location, making the risks of sharing de-identified data anything but trivial. Importantly, the ACS framework is so far only concerned with the sharing of data in controlled environments rather than “data release” to the public or third parties where re-identification risks are higher.
A key part of the data sharing challenge is finding the “safe” level of personally identifiable information for a particular set. Too much information and the data is easily reconstructed to an identifiable level, but too little harms the utility of the data.
“The test in every jurisdiction I’ve come across is a ‘reasonableness’ test,” says Dr Ian Oppermann, NSW chief data officer leading the new initiative.
“Can you reasonably identify an individual in this data set? And in NSW it’s living or dead [in] the last 30 years.”
Historically the approach has been strip out clearly personal information like name, address and telephone number. A relatively effective measure for smaller data sets, Oppermann told Which-50.
“[But] when you’re linking together tens or even hundreds of data sets that reasonable test starts to break down.”
People also rightly become more concerned with the increased linkage as oversight over who is using the data and for what purpose it is being used becomes less clear.
Opperman says the new initiative is essentially trying to do two things: One, identify how much personal information is in linked data sets, and two, what is the least aggregation or perturbation that must be done to reach a safe level of identifiable information.
In this regard there is no universal level of safety and the quantifiable level will vary depending on who is using the data and what they are using it for.
But identifying the level of personal information and what needs to be done could, Oppermann says, allow a single data set to be used for multiple reasons while still retaining a reasonable safety level. Or at least provide more clarity around risk and governance.
“The whole challenge around data sharing; rather than saying it’s a black and white situation; there is personal information you can’t have it or you need to follow ethics or various other proposals, or I need to aggregate to a point where it’s not that useful anymore.”
“We’re trying to get some shades of grey in the middle.”
So far he and his team have hypothesised a “Personal Information Factor” or PIF. A PIF essentially tells data users what shade of grey they are dealing. With that information users could abandon data sharing or proceed with the appropriate risk frameworks built around the data.
Earlier this month an ACS-led hackathon produced a prototype application that allowed a data custodian to visualise and adjust the amount of personally identifying information in a dataset. The ACS group hopes the data sharing framework and such tools can eventually uproot the longstanding reluctance to share data in Australian governments.
Currently, the lack of public confidence and uncertainty about the often vague and overlapping privacy laws means Australian Governments remain reluctant to share data with researchers and within the government.
“There’s a lot of people who think the Government is already doing this sort of data sharing and linkage work. It’s really not as advanced as people think,” Oppermann says.
While some progress has been made by the NSW Government on data sharing among agencies, the challenges persist particularly around sensitive data, according to the chief data scientist.
“The arguments typically are; unwilling, unable, not allowed,” Oppermann says of governments sharing data.
Government agencies struggle to trust their counterparts with data, have concerns about the lack the capability (much of the data still resides on magnetic tape for instance) and are cautious to tread through a privacy legislation minefield with several overlapping regulations.
“There are real issues about the clarity of what you’re allowed to do [including] primacy of legislation. But also, [government agencies] all come back to if there’s personal information.
“There is no way to unambiguously tell, at the moment, if there’s personal information.”
But providing the data sharing framework to enable that will only be part of the solution. Oppermann acknowledges public buy-in will also be crucial to any data sharing success.
“There’s a whole lot of work we can do about telling people what we’re doing, why we are doing it, which data sets we are using in anticipation of what outcome.
“There’s an inclusiveness on the journey of developing new digital services that we need to get a whole lot better at. And talk about what happens when things go wrong; what happens if systems crash or unintended consequences happen, and how do you have your right of redress which doesn’t rely on digital channels to have that right of reply.”
The new framework is an extremely important piece of the data sharing puzzle but it alone has limited use and an over reliance on it can create a risk of “tick-box” data sharing, according to Dr Dali Kaafar, Professor of Privacy Preserving Technologies at Macquarie University.
Kaafar told Which-50 presenting a data sharing framework as a comprehensive solution creates risks of further embedding cultural problems around data sharing. It is better thought of as a guidelines or best practice, Kaafar says, and in that regard has limited use.
“By the promotion of some of these frameworks as the ultimate weapon or as the way of doing things, it may create some false sense of security that could be inducing some organisations to just ticking the box.”
He explained organisations, data custodians and developers can become complacent with the belief they are complying with data sharing frameworks, conflating the compliance with actual security and risk management.
“They miss all the subtle aspects about the data sharing and they end up, for example, developing some badly shaped or faulty algorithms that will end up in some data breach or privacy violation,” Kaafar said.
“My concern is that this [framework] would be sitting in peoples cupboards and this all makes sense but we don’t necessarily have the tools to implement it.”
The problem is not specific to Australia or the ACS approach he said. Rather it is a universal problem stemming from the complex notion of privacy.
“It is a very complex environment where we deal with individuals data and where I think we shouldn’t be talking about [only] one approach. We should be more talking about a data specific approach.”
A data specific approach to data sharing isn’t necessarily onerous, Kaafar says, but does require a more technical approach than is currently provided.
“Frameworks and governance frameworks are definitely part of the answer but only part of it … Without a solid and provable technical approach these frameworks would induce, or would actually become, a threat. Certainly an unintended threat to the privacy and security of the data because of the false sense of security that they are invoking.”
“Without a solid and provable technical approach these frameworks would induce, or would actually become, a threat. Certainly an unintended threat to the privacy and security of the data because of the false sense of security that they are invoking.”
Kaafar is advocating for a more holistic approach to data sharing, one he describes as “pragmatic optimism”.
Pragmatic refers to the increasingly obvious need and benefit of data sharing. While optimism is the acknowledgement that the technology to protect privacy exists and is improving, according to Kaafar.
“We have to find the right trade off that satisfies both an acceptable utility out of the insights but also, and very very importantly, an acceptable privacy guarantee that would define how much privacy threats are there or how much privacy guarantees are there in a very very rigorous mathematical way.”
A lack of risk management
Dr Bernard Robertson-Dunn, the Australian Privacy Foundation’s Health Committee Chair, says appropriate risk management is conspicuously absent from the ACS whitepaper which takes a “technology up” approach, without enough consideration for responsibility, consequential costs, and reporting frameworks.
“In broad terms, risk management is the issue,” he told Which-50.
“[The ACS whitepaper] includes the phrase but does not discuss it. And any framework for data sharing should, in my opinion, have a risk management framework that is comprehensive so that everybody knows what it is [and] it’s completed before the data sets are shared, so you know in advance what the risks are.”
The privacy advocate argues Australia’s recent data sharing push is sacrificing safety for speed and many checks and balances are being left out, one of them being risk management.
“You could say that every person whose data is within the dataset should be contacted and asked for explicit consent for a particular project. That is obviously onerous, so whoever is now managing the risk associated with his project is taking a risk [by not asking individuals].”
“Where is the framework that describes that risk and how they manage that risk?”
He says the ACS framework largely reflects its members – larger organisations and technical individuals – but discounts to some extent the general public, which create the data and ultimately benefit from rigorous risk management.
According to Robertson-Dunn, a data sharing analysis which purports to have “presented frameworks that address major issues of data sharing” should also have included an extensive consideration of risk management. And while the ACS expects to explore these areas further, Roberson-Dunn argues the fundamental importance of risk management means it should have bee included alongside the framework from the start.
“This is far more than a technical issue. It is to do with privacy, which is not strictly technical, it is to do with risk management — responsibility, accountability, and legal issues,” Robertson-Dunn says. “There needs to be more analysis, discussion and agreement about taking risks with data.”