Patient files may contain vital hints for detecting diseases at an early stage. However, evaluating them would violate patient privacy. This is where mathematics can help.
Evaluating collated patient data without disclosing any sensitive information about individuals poses a considerable challenge. The team headed by Prof Dr Hans Simon from the Horst Görtz Institute for IT Security at the Ruhr-Universität Bochum has developed a method that facilitates precisely that. The mathematicians distort the data in such a way that individual patients remain anonymous during analysis. Nevertheless, self-learning computer programmes are able to detect correlations in the changed data almost as well as in the original data.
In principle, the distortion works as follows: dice are cast for each patient file; the number on the dice is added to all values in the file. This method alters individual data significantly and unpredictably, but, in the best-case scenario, it does not affect the statistical summaries to a greater extent than the random fluctuation that is present in the data in any case.
For the purpose of their work, the researchers at the Chair of Theoretical Computer Science established a precise definition of what it means in mathematical terms that patients should remain anonymous. And what it means that, distorted or not, the results should not deviate strongly from each other. In order to meet the defined requirements, the mathematicians translated the problem in a geometric representation.
Data represented as vectors
Each patient file was represented as a vector, i.e. an arrow in a geometric space. The evaluation algorithm was only permitted to ask Yes/No questions, such as: Does the patient smoke? Does the patient weigh more than 80 kilograms? Each of these questions was likewise represented as a vector. File vector and question vector forming an obtuse angle symbolised a No response; a sharp angle stood for a Yes response.
Rather than distorting the original data, the researchers carried out that step only after they had converted the data into vectors. Thus, information pertaining to individual patients could be kept anonymous, while at the same time, the researchers were able to make statistical statements about the collated data of all patients.
Cite This Page: