We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
In today’s digital-driven economy, data is the new oil. Companies and organizations are hungry for access to new data, leading to an ever-increasing demand for cross-organizational and cross-industry data collaborations. Unfortunately, data collaborations are challenged by equally increasing concerns of data security, privacy and confidentiality. This often leaves companies unable to extract value from sensitive data and hinders the pace of innovation.
For instance, the medical industry longs for cross-industry data collaborations to advance medical research and drug discoveries with the help of AI, but it must also contend with regulatory and legal restrictions related to patient privacy. Similarly in the banking industry, cross-institutional data collaboration is crucial to combat financial crimes, such as money laundering, yet the cost of such collaboration is often prohibitively high due to data privacy and confidentiality regulations.
Wouldn’t it be nice to have a technology that can facilitate data collaboration and computation without ever revealing or jeopardizing the underlying data? That’s where privacy computing technologies (a.k.a. privacy-enhancing technologies) come in. In short, privacy computing technologies include a range of hardware or software solutions designed to perform computations on data, thus extracting value from data, without risking the privacy and security of the data itself.
Gartner listed privacy computing as one of the top strategic technology trends in 2021. In this article, I will briefly discuss a few of the privacy computing approaches and share my view from a deep-tech VC’s perspective.
MPC is a software-based security protocol where multiple data owners jointly compute a function over their individual inputs while keeping the input data private. Data security is achieved by shuffling data from individual parties and distributing it across multiple parties for joint computations, all without the need to trust any of the parties (a.k.a. trustlessness).
Mathematically speaking, MPC is an elegant and secure approach, though there are certain inherent issues in practical applications. For example, the MPC computations involve a large number of data exchanges among parties and, as a result, can be vulnerable to network latency, and are often limited by the slowest data link among the parties. Many researchers are continuously improving MPC technology. Startups like Baffle and Inpher, just to name a few, have managed to gain traction with practical MPC use cases, especially in the finance and healthcare sectors.
Trusted execution environment
Another important privacy computing approach is TEE, sometimes called trusted enclave or confidential computing. TEE technology is a hardware-based solution that utilizes a secure area on a CPU to conduct encryption and decryption, and secure computation. Outside of the enclave, data is always encrypted. Intel, AMD and other chip makers offer various versions of TEE chips.
TEE is a flexible and efficient confidential computing technique and can scale relatively easily. Interestingly, the security of the TEE approach is often questioned due to its vulnerability to hardware exploits and vendor backdoors. The other issue with TEE is that security patches would require hardware upgrades as opposed to simple software/firmware patches. Despite these concerns, TEE technology has seen decent adoption with Microsoft Cloud using Intel’s SGX solution and Google Cloud with AMD’s EPYC processors. Many big tech companies, as well as startups such as Fortanix and Anjuna, are actively expanding TEE use cases for new market verticals, including banking, healthcare and manufacturing.
FL is an interesting privacy computing technique with a focus on data privacy in AI model training. Do you ever wonder how the texting app on your smartphone can predict the next word you’re about to type? Well, chances are, they are trained using FL techniques.
Instead of collecting user input data (typed words in this case) from individual devices to train a keyboard prediction model at a central server, FL techniques distribute the prediction model to the edge devices to be trained locally. After each iteration of local training, only the gradient information is sent back to the central server, where the prediction model parameters are updated and sent back to the edge for further training. After certain iterations, you have a globally trained keyboard prediction model without ever moving the individual data from the edge devices.
This approach by itself is actually not secure, since the central server could in theory reverse engineer the original data using the gradient information. As such, FL is often used in conjunction with other encryption techniques. For example, Hong Kong-based Clustar utilizes FL in conjunction with FPGA-based homomorphic encryption technology, which we’ll discuss next, to deliver a highly efficient and secure FL solution for the financial sector.
Fully homomorphic encryption
Finally, let’s take a look at FHE, a software-based security protocol where user data is encrypted such that mathematical computations can be performed on the encrypted data without ever needing to decrypt the data in the first place.
While the concept of FHE was envisioned in the 1970s, the breakthrough came in 2009 by Craig Gentry as part of his Ph.D. dissertation, where he constructed the first FHE scheme. Since then, many FHE schemes have emerged with vastly improved performance and security.
FHE is considered one of the most secure protocols that does not require trust in any third parties that touch any part of the data lifecycle: data in transit, data at rest and data in use. In fact, FHE has been proven to be quantum-proof; that is, resistant against cryptanalytic attacks by a quantum computer.
However, FHE does have one significant drawback: FHE computations are excruciatingly slow, often 100,000 times slower than computation on cleartext. While many consider this as the Achilles’ heel of FHE, a venture investor may see it as an opportunity.
If history teaches us anything, there could be interesting parallels between FHE and the early days of RSA (Rivest-Shamir-Adleman) technology. At its inception in the 1970s, a 1024-bit RSA encryption took more than 10 minutes to complete, making it impractical. Today, RSA is widely adopted in over 90% of the secure data transmissions and the same encryption takes less than 0.1 milliseconds on an edge device, all thanks to algorithmic improvements and advances in semiconductor technology.
Similarly, software and hardware acceleration could be the key to unleashing the full potential of FHE technologies. In recent months, several FHE startups successfully raised sizable amounts of capital, including software provider Duality and high-performance computing chip developer Cornami*.
There are many more privacy computing technologies that are not discussed here, including zero-knowledge proof, differential privacy, synthetic data and others. The bottom line is privacy computing technologies are critical to solving the seemingly unsolvable conflicts between the need for data collaboration and data security.
Early adoption of the technologies will likely take place where there’s tremendous value to be created from data collaboration, yet the cost of the collaboration is prohibitively high, such as in the medical and banking industries.
As privacy computing technologies become more mature and performance improves, broader adoption is expected. As Gartner predicted, “by 2025, half of large organizations will implement privacy-enhancing computation for processing data in untrusted environments and multiparty data analytics use cases.”
This is an exciting area with tremendous opportunities for both hardware and software innovations. I can’t wait to see what the future holds for privacy computing technologies.
*Note: The author’s firm has an investment stake in Cornami.
John Wei is an investment director at Applied Ventures, LLC.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!