Advances in Differentially-Private Synthetic Data Generation

Head shot of Cristobal Guzman. Differential Privacy (DP) has become the de facto standard for protecting users’ sensitive data in data analysis. Within this field, one of the most challenging problems is private synthetic data generation, which allows high-quality synthetic data to be shared publicly while retaining the original data’s statistical properties. However, worst-case query workloads often face steep computational or error-rate bottlenecks.

On Monday, June 22, 2-3pm, join the Data Science Institute for a seminar with Dr. Cristóbal Guzmán, Associate Professor, Institute for Mathematical and Computational Engineering, Pontificia Universidad Católica de Chile. In this seminar, Dr. Guzmán will present two recent theoretical advancements that address these challenges. The first is Private Relative Error Multiplicative weight update (PREM), a framework designed to produce DP synthetic data with relative error guarantees. While traditional DP mechanisms incur worst-case additive errors that must scale polynomially with the dataset, domain, or query family size, PREM establishes relative error bounds which are poly-logarithmic on the aforementioned quantities.

The second is the computational complexity of generating DP synthetic data by analyzing it through the lens of parameterized complexity. Dr. Guzmán will establish that private synthetic data generation is Fixed-Parameter Tractable (FPT) when parameterized by the treewidth of the query family’s incidence graph. He will outline two distinct algorithmic approaches that achieve optimal error rates across all regimes: a linear programming (LP) approach where the separation problem for the LP dual is rendered FPT via dynamic programming, and a subsampled private multiplicative weights method where sampling from the underlying high-dimensional distribution is shown to be FPT.

The seminar will be held in the Morgridge Hall Seminar Room (7560). Light refreshments will be served.