Jul 2018: RSARC: Trustworthy Computing over Protected Datasets
There has been an unprecedented increase in the quantity of research data available in digital form. Combining these information sources within analyses that leverage cloud computing frameworks and big data analytics platforms has the potential to lead to groundbreaking innovations and scientific insights. As developers and operators of the widely used Dataverse repository and the Massachusetts Open Cloud platform, we have been working to advance this innovative revolution by colocating datasets in common platforms, curating and tagging datasets with both functional and legal access policies, offering helper services such as search and easy citation to promote sharing, and providing on-demand computational platforms to ease analytics. Unfortunately, we observe that a certain segment of our scientific user base cannot enjoy the full transformative capacity achievable within our cyberinfrastructure. Due to concerns over the privacy and confidentiality of their data sources, or the potential of commercial exploitation of their raw data sets, these researchers are isolating themselves within siloed data repositories and well-protected computational enclaves rather than sharing their datasets with fellow scientists.
This talk will describe cryptographic technological enhancements that are ready to provide scientific researchers with mechanisms to do collaborative analytics over their datasets while keeping those datasets protected and confidential. Secure multi-party computation (MPC) is a cryptographic technology that allows independent organizations to compute an analytic jointly over their data in such a manner that nobody learns anything other than the desired output. Hence, MPC empowers organizations to make their data available for collective data aggregation and analysis while still adhering to pre-existing confidentiality constraints, legal restrictions, or corporate policies governing data sharing. Our new Conclave framework can connect to many existing backend stacks where the data already live, can automatically analyze a query to identify when a computation must cross data silos, and can leverage MPC in a scalable and usable manner when it is necessary to enable the computation.
In summary, while data sharing cyberinfrastructures today are intended to allow everyone to benefit from the initial cost of having one researcher collect data, privacy concerns (and the resulting breakdown of data sharing) transform this burden into a marginal cost that every researcher who wants access to the data must pay. We will describe how a holistic integration of secure MPC into a scientific computing infrastructure addresses a growing need in research computing: enabling scientific workflows involving collaborative experiments or replication/extension of existing results when the underlying data are encumbered by privacy constraints.
Speaker Bios:
Mayank Varia is a research associate professor of computer science at Boston University and the co-director of the Center on Reliable Information Systems & Cyber Security (RISCS). His research interests span theoretical and applied cryptography and their application to problems throughout and beyond computer science. He currently directs an NSF Frontier project that addresses grand challenges in cloud security, aiming to design an architecture where the security of the system as a whole can be derived in a modular, composable fashion from the security of its components (bu.edu/macs). He received a Ph.D. in mathematics from MIT for his work on program obfuscation.
Andrei Lapets is Associate Professor of the Practice in Computer Science, Director of Research Development at the Hariri Institute for Computing, and Director of the Software & Application Innovation Lab at Boston University. His research interests include cybersecurity, formal methods and domain-specific programming language design, and data science. He holds a Ph.D. from Boston University, and A.B. and S.M. degrees from Harvard University.