American Family Funding Initiative Awards

American Family Insurance has partnered with UW through the Data Science Institute to offer mini-grants for data science research at UW–Madison. The goal of the American Family Funding Initiative is to stimulate and support highly innovative, groundbreaking research. Launched in spring 2020, this initiative is expected to position UW-Madison faculty to launch and further cutting-edge data science research and be more competitive when applying for extramural research funding.

To date, 40 teams of UW-Madison faculty and collaborators have been awarded nearly six million dollars through the American Family Funding Initiative for data science research. These awards are described below.

2024 Awards

Multimodal Method Approach for Risk Assessment and Reasoning (PI: Kaiping Chen; Co-PI: Junjie Hu)

Headshot of Kaiping Chen This project addresses the need for advanced risk assessment in sectors like insurance, criminal justice, and misinformation management by harnessing the power of Vision-Language Models (VLMs) for analyzing complex, multimodal data. By integrating real-world data sources, the proposed system will learn to offer detailed risk assessments and suggestions, advancing the field of AI in risk communication and providing significant societal benefits.

Kaiping Chen (kchen67@wisc.edu) is an assistant professor of computational communication in the Department of Life Sciences Communication. Her interdisciplinary research employs data science and machine learning methods, as well as interviews, to examine how digital media and technologies affect political accountability to public well-being.

Junjie Hu (junjie.hu@wisc.edu) is an assistant professor in the Departments of Computer Sciences and Biostatistics and Medical Informatics. He has a broad interest in natural language processing and machine learning, and his research goal is to build robust, intelligent systems that evolve with changes in the environment and interact with people speaking different languages.

Data-Driven Wildfire Ignition Prediction from an Insurance Perspective (PI: Min Chen)

Head shot of Min Chen This project will develop an innovative model to predict where and when wildfires are likely to occur, understand the complex factors that lead to wildfire events, and predict risks with greater accuracy than ever before. This approach not only sets a new benchmark for environmental risk assessment models, but also opens avenues for applying similar methodologies in other fields where prediction and risk assessment are crucial.

Min Chen (min.chen@wisc.edu) is an assistant professor in the Department of Forest and Wildlife Ecology. His research focuses on how climate change and human activities affect terrestrial ecosystems, as well as their feedback to the human and Earth systems.

Multimodal Foundation Model for Driver Behavior Profiling and Risk Analysis (PI: Song Gao)

Head shot of Song Gao Driver profiling is a procedure that categorizes drivers as safe or aggressive according to their driving behavior, using data such as speed, acceleration, breaking, and steering. However, an “aggressive” driver may drive more cautiously under conditions such as traffic congestion or extreme weather. This research will use machine learning and foundation models to better understand the dynamic risk context for driver profiling, enabling more effective multimodal data-driven decision-making in both the private and public sectors.

Song Gao (song.gao@wisc.edu) is an associate professor in the Department of Geography, where he leads the Geospatial Data Science Lab. His research areas range from geospatial artificial intelligence (GeoAI) to human mobility, urban informatics, transportation and public health.

Sensitivity Analysis and Counterfactual Analysis of Insurance Data (PI: Hyunseung Kang)

Head shot of Hyunseung Kang The project develops easy-to-use tools to answer a broad range of counterfactual questions arising in the insurance industry, such as, “What would be the loss from a claim if a policyholder had a ‘Fair’ credit score instead of an ‘Excellent’ credit score?”. Beyond insurance, these tools address more fundamental questions in data science about causality and have the potential to strengthen claims about causality from observational data.

Hyunseung Kang (hkang84@wisc.edu) is an associate professor in the Department of Statistics. His research is focused on developing methods to analyze causal relationships by using instrumental variables, econometrics, semi/nonparametric methods, network analysis, and machine learning.

A Framework for Valid and Reliable Audits of Biases in Large Language Models (PI: Shamya Karumbaiah; Co-PI: Daniel Bolt)

Headshot of Shamya Karumbaiah This project will develop a framework to audit large language models (LLMs), which power AI applications, for harmful biases. The framework will make fairness failures and consequent harms more transparent, build stakeholder trust in issues of bias, and improve LLM reliability in diverse deployment contexts and populations by informing future work on fairness failures. This direct, transparent approach to LLM evaluation will build stakeholders’ agency and power to be critical consumers.

Shamya Karumbaiah (chodumadakar@wisc.edu) is an assistant professor in the Department of Educational Psychology. She works at the intersection of machine learning and learning sciences, studying ways to fairly and equitably promote student engagement and learning in adaptive and artificially intelligent educational systems.

Daniel Bolt (dmbolt@wisc.edu) is the Nancy C. Hoefs-Bascom Professor of Educational Psychology and chair of the Quantitative Methods area. His research interests are in the theory and application of psychometric methods in education and psychology.

Interpretable Causal Inference for Multimodal and Relational Data (PI: Keith Levin)

Head shot of Keith Levin Causal inference tools, which seek to establish causation rather than correlation between quantities, are not well-suited to large-scale, multimodal data that is typically noisy and incomplete. This project will develop methods that can give sensible answers to causal questions applied to multimodal data. The researchers aim to develop tools that can discern the direct and indirect effects of a treatment on an outcome of interest.

Keith Levin (kdlevin@wisc.edu) is an assistant professor in the Department of Statistics. His research focuses on network analysis, dimension reduction, concentration inequalities, and clustering problems, with applications to neuroscience and speech processing.

Safer Driving Through Optimized Telematics-Based Feedback (PI: Tony McDonald; Co-PI: Yonatan Mintz)

Head shot of Tony McDonald The current volume of vehicle-level telematics data offers new opportunities to promote safe driving through driver feedback. Pilot driver feedback programs face barriers including broad adoption and long-term behavior change. This project will address these challenges by developing a novel behavioral analytics algorithm to personalize driver feedback and optimize safety nudges. The goal of this project is to create a novel system that can improve driving safety, and then validate findings through an on-road telematics study.

Tony McDonald (admcdonald@wisc.edu) is an assistant professor in the Department of Industrial and Systems Engineering. His work bridges the fields of human factors engineering and machine learning, allowing them to inform each other and work in concert to create a safer world.

Yonatan Mintz (ymintz@wisc.edu) is an assistant professor in the Industrial and Systems Engineering department at the UW–Madison. His research focuses on the application of machine learning and automated decision making to human-sensitive contexts such as personalized health care.

Novel Methods for Hail Detection (PI: Kevin Ponto; Co-PI: Ross Tredinnick)

Head shot of Kevin Ponto The project aims to create a novel method for detecting vehicular hail damage using sensing technology found on consumer devices. The proposed product will enable a consumer to document hail damage remotely and asynchronously, thereby alleviating the need to send an individual into the field to make assessments in person.

Kevin Ponto (kbponto@wisc.edu) is an associate professor at the Wisconsin Institute for Discovery and an Audrey Rothermel Bascom Professor in the Design Studies department at the School of Human Ecology. The long-term goal of his research is to redesign the interface between the physical and virtual worlds, combining methods from the fields of virtual reality, ubiquitous computing, human computer interaction, and design.

Ross Tredinnick (rdtredinnick@wisc.edu) is a systems programmer at the Wisconsin Institute for Discovery. His interests include immersive visualization, 3D environment scanning, and virtual reality research and development.

Developing a Prototype Data-Driven Stochastic Convective Hazards Emulator (PI: Daniel Wright; Co-PI: Yagmur Derin)

Head shot of Daniel Wright Extreme rainfall, hail, and high winds are among the most damaging—and difficult to predict— weather phenomena. High-resolution computer models can predict these hazards, but they require enormous computing power. This project will develop an emulator that can quickly generate high-resolution hazard predictions under current and future conditions at a lower computational expense. This will allow insurers to better assess uncertainty in rainfall, hail, and wind hazards in a warming climate.

Daniel Wright (danielb.wright@wisc.edu) is the Arno Lenz Memorial Associate Professor of Water Resources Engineering in the Department of Civil and Environmental Engineering. His research interests include hydrometeorology and hydroclimatology, the role of rainfall space-time variability in floods and other environmental phenomena, and applications of satellite and ground-based remote sensing.

Yagmur Derin is a research scientist in the Department of Civil and Environmental Engineering. Her research interests lie in quantitative understanding of precipitation and related natural hazards, with focus areas including satellite and radar-based remote sensing and characterization of orographic precipitation.

2023 Awards

3D Scanning and Machine Learning for Comprehensive Property Inspection (PI: Joao Dorea)

Headshot of Joao Dorea This project will prototype a machine learning framework based on data collection via mixed reality headsets and smartphones, as well as explore data processing through edge- and cloud-computing infrastructure, to generate near-real-time 3D reconstructions. The novelty of this project stems from its potential to leverage regular cameras for creating highly realistic scene reconstructions and to generate multimodal datasets by combining vision and IoT sensor devices. The systematic collection of these datasets will provide new means for property inspection and risk assessment, allowing for more efficient and accurate decision-making.

Joao Dorea (joao.dorea@wisc.edu) is an assistant professor in the UW–Madison Department of Animal and Dairy Sciences. Specializing in precision agriculture and data analytics, he applies AI to optimize farm decision making and improve animal nutrition and health.

Learning what is Relevant for Counterfactual Policy Evaluation (PI: Josiah Hanna)

Head shot of Josiah Hanna As AI and autonomous systems are trusted with increasingly important decisions, it becomes imperative to be able to evaluate the performance of their decision-making policy before deployment. One way to accomplish this is to take data recorded while making decisions with a previously used policy and answer the counterfactual question, “What would have happened if the new policy had been making decisions instead of the older policy?” The goal of this project is to introduce novel methods for counterfactual policy evaluation in sequential decision-making settings that overcome the practical limitations of existing methods.

Josiah Hanna (jphanna@cs.wisc.edu) is an assistant professor in the UW–Madison Computer Sciences Department. His research develops and applies reinforcement learning algorithms that are effective with a limited amount of data and integrates these algorithms into complete autonomous agents. His long-term research goal is developing fully autonomous agents that learn how to achieve goals from experience.

Improving Auto-labeling with Confidence Functions (PI: Ramya Korlakai Vinayak; Co-PI: Fred Sala)

Headshot of Ramya Korlakai Vinayak Auto-labeling systems are emerging as a popular and widely-used alternative to reduce the cost associated with creating new training datasets. Such systems aim to reduce human labeling effort by automatically labeling data points using models that are actively trained in an iterative manner. There is very little understanding of how reliable the data obtained by such are. Confidence functions that quantify the uncertainty in a model are key components in these systems. This project will build a systematic understanding of confidence functions that can guarantee reliability for auto-labeling systems. It will also build practical algorithms based on our theoretical analysis, along with an evaluation platform to benchmark their performance.

Ramya Korlakai Vinayak (ramya@ece.wisc.edu) is an assistant professor in the Department of Electrical and Computer Engineering at UW–Madison. Her research interests span the areas of machine learning, statistical inference, and crowdsourcing. Her work focuses on addressing theoretical and practical challenges that arise when learning from societal data.

Detecting AI-altered Images to Combat Claims Fraud (PI: Yong Jae Lee)

Head shot of Yong Jae Lee Insurance fraud costs businesses and consumers billions of dollars annually, which in turn results in premium increases. With companies relying more on automated, self-service transactions, the need to reliably detect fraudulent claims is becoming imperative. One manifestation of claims fraud is fake visual assets, such as images exaggerating damages, that have been generated or altered using AI software (deepfakes). This project will develop novel algorithms that can detect fake images from various breeds of AI generative models, even those unseen during training.

Yong Jae Lee (yongjaelee@cs.wisc.edu) is an associate professor in the UW–Madison Department of Computer Sciences. His research interests are in computer vision and machine learning. He is particularly interested in creating robust visual recognition systems that can understand visual data with minimal human supervision.

Supervised Sequence Analysis for High-Dimensional Discovery of Predictive Histories for Health and Economic Outcomes (PI: Adeline Lo; Co-PI: Héctor Pifarré i Arolas)

Head shot of Adeline Lo Finding highly predictive profiles from medical histories or past insurance product performance can be especially informative for improving risk assessment or evaluating new products. Sequence Analysis (SA) is particularly well-suited to handling histories of data in an interpretive way. However, a key limitation of current SA methods is that the sorting stage is unsupervised and disconnected from the predictive stage. This project will innovate on SA methods with supervised techniques that address this limitation.

Adeline Lo (aylo@wisc.edu) is an assistant professor in the UW–Madison Department of Political Science and Glenn B. & Cleone Orr Hawkins Chair. Her research interests lie in the design of statistical tools for prediction and measurement for applied social sciences. She works with and constructs tools for observational, text, experimental, and network data.

Behavioral Analytics for Personalized Safer Driving (PI: Yonatan Mintz)

Head shot of Yonatan Mintz Most existing usage-based insurance uses “one size fits all” discounts and rate adjustments that fail to account for differences in policyholders’ behavior. This project will develop a framework that uses behavioral modeling to create personally tailored incentives and discounts for safer driving, using telematics data and simulation. These incentives will be calculated to reduce the expected number of future claims while adapting to the drivers’ individual levels of risk.

Yonatan Mintz (ymintz@wisc.edu) is an assistant professor in the Industrial and Systems Engineering department at the UW–Madison. His research focuses on the application of machine learning and automated decision making to human-sensitive contexts such as personalized health care.

Reliable and Scalable Shapley Value Estimation for Interpretable ML (PI: Garvesh Raskutti)

Head shot of Garvesh Raskutti Interpretable machine learning has become an area of significant interest due to the increased use of black-box methods such as neural networks, random forests, and others. One way of interpreting model-agnostic methods is to estimate Shapley values, which capture how much each variable (or subset of variables) contributes to prediction performance. However, one of the major challenges for Shapley value estimation is scalability, especially when the number of variables is large. This project will address this challenge by providing a reliable and scalable approach for computing Shapley values for arbitrary machine learning prediction models.

Garvesh Raskutti (raskutti@stat.wisc.edu) is an associate professor in the Statistics Dept. at UW–Madison. His research interests include statistical machine learning, optimization, graphical and network modeling, and information theory with applications to systems biology and neuroscience.

Data-Efficient Customization for Large Pretrained Models (PI: Fred Sala)

Headshot of Fred Sala Large pretrained models must be customized for specialized applications in industry. There is no standard way to perform such customization, and existing approaches require large amounts of hand-annotated data. This project seeks to address this challenge by building a unified framework to customize large pretrained models with minimal human annotation or effort. It will develop techniques to make models compatible with any data modality, fine-tune them with minimally labeled data, and modify them to be more responsive to human feedback. This work has the potential to make powerful, general models useful for specialized industry applications.

Fred Sala (fredsala@cs.wisc.edu) is an assistant professor in the Department of Computer Sciences at UW-Madison. His research interests are in machine learning, focusing on data-efficient learning, automated machine learning, and integrating geometry into machine learning pipelines.

New Joint Modeling Approach for Risk Assessment of Digital Medical Records (PI: Raj Veeramani; Co-PI: Shiyu Zhou)

Head shot of Raj Veeramani Electronic health records (EHRs) contain rich information regarding an individual’s medical history and health condition that can yield valuable insights to predict their life expectancy. This project aims to create a novel joint modeling approach that utilizes diverse types of EHR data, such as health event data (e.g., illnesses faced) and physiological data over time (e.g., blood pressure) from multiple individuals to accurately assess individualized mortality risk and life expectancy.

Raj Veeramani (raj.veeramani@wisc.edu) is the E-Business Chair Professor at UW–Madison, with joint appointments in the College of Engineering and the School of Business. His research focuses on new frontiers of digital transformation, industrial data analytics, IoT technologies and applications, smart and connected systems, and supply chain management.

ELSA: Efficient Label Shift Adaptation for Accurate Claim Fraud Detection (PI: Jiwei Zhao)

Headshot of Jiwei Zhao Insurance fraud has increased dramatically with the expansion of modern technology and global communication, resulting in the loss of billions of dollars worldwide each year. Both supervised and unsupervised machine learning algorithms have been applied to detect a variety of fraudulent cases. This project will develop a novel, semi-supervised learning method for accurate and efficient fraud detection.

Jiwei Zhao (jiwei.zhao@wisc.edu) is an associate professor of Biostatistics and Medical Informatics at UW–Madison. He uses statistical methods and machine learning techniques to analyze data with massive structures.

2022 Awards

Multi-Modal Analytics for Unbiased Estimation of Driving Behavior (PI: Suman Banerjee)

Head shot of Suman Banerjee Understanding driving behavior is central to efficient, safe transportation and associated insurance mechanisms. This project seeks to create an unbiased system for evaluating driving behavior that will use multi-modal signals, especially from audio-visual sensors, to learn contextual information about why certain behaviors happen.

Suman Banerjee (suman@cs.wisc.edu) is the David J. DeWitt Professor in the Department of Computer Sciences at UW-Madison. His primary research interests include networking and distributed systems, specifically mobile and wireless networking systems, with many applications in smart transportation, smart healthcare, and in secure and sustainable systems.

Quasi-Experimental Designs for Learning Systems (PI: Amy Cochran; Co-PIs: Gabriel Zayas-Caban and Brian Patterson)

Headshot of Amy Cochran A growing number of systems, including hospitals and insurance companies, aim to derive knowledge from internal data to improve their day-to-day operations. This project will develop a causal inference framework for estimating the effects of interventions on these systems and provide algorithms to guide their use of risk prediction models in their operations, with the ultimate goal of improving services and reducing costs.

Amy Cochran (cochran4@wisc.edu) is an assistant professor at UW-Madison with a joint appointment in the Departments of Mathematics and Population Health Sciences. She works in the areas of computational psychiatry, digital mental health, and causal inference.

Fairness Guarantees for Learners Without Explicit Access to Demographics (PI: Kassem Fawaz)

Head shot of Kassem Fawaz The most advanced sample complexity bounds on fair machine learning are called multicalibration convergence bounds. These bounds specify the number of samples required to achieve performance parity across many population demographics. This project will yield additional perspective on both algorithmic fairness and multicalibration error convergence bounds. Further, it will enable machine learning practitioners to easily understand the convergence behavior of multicalibration error for a myriad of classifier architectures.

Kassem Fawaz (kfawaz@wisc.edu) is an assistant professor in the Department of Electrical and Computer Engineering at UW-Madison. His research interests primarily include security and privacy for users interacting with their devices, with emphasis on voice interfaces, privacy policies, malware detection, data privacy, and wireless security and privacy.

Counterfactual Evaluation of Sequential Decision Policies (PI: Josiah Hanna)

Head shot of Josiah Hanna One way to evaluate AI-based decision-making policies and autonomous systems before they are deployed is to take data from a previously used policy and answer the counterfactual question, “What would have happened if the new policy had been making decisions instead of the older policy?” This project will introduce novel methods for counterfactual policy evaluation in sequential decision-making, where even small changes in how decisions are made can lead to drastically different outcomes over time.

Josiah Hanna (jphanna@cs.wisc.edu) is an assistant professor in the UW-Madison Computer Sciences Department. His research develops and applies reinforcement learning algorithms that learn with small amounts of data. His long-term research goal is developing AI systems that can quickly learn new capabilities from experience.

Auto-labeling Foundations (PI: Ramya Korlakai Vinayak; Co-PI: Fred Sala)

Headshot of Ramya Korlakai Vinayak While crowdsourcing is a popular way to collect labeled training data for machine learning, it is expensive and time-consuming to hand-label each data point. Systems that automatically label data points while actively learning a model perform well in practice, but there is no theoretical understanding of what performance guarantees can be expected from these systems, or whether the resulting biased datasets can even be trusted. This project aims to close this gap by developing theoretical foundations for characterizing the performance of auto-labeling systems.

Ramya Korlakai Vinayak (ramya@ece.wisc.edu) is an assistant professor in the Department of Electrical and Computer Engineering at UW-Madison. Her research interests span the areas of machine learning, statistical inference, and crowdsourcing. Her work focuses on addressing theoretical and practical challenges that arise when learning from societal data.

Contrastive Language-Image Learning for Out-of-distribution Detection (PI: Sharon Li)

Headshot of Sharon Li A major issue that prevents machine learning algorithms from being deployed to real-world problems is the safe handling of anomalous data that differs from the training distribution data. Prior research on out-of-distribution (OOD) data detection has been primarily driven by image recognition tasks. This project will pioneer new directions in contrastive image-language learning for OOD detection. We will explore how language and vision can provide complementary sources of information to better estimate uncertainty in claim fraud detection and other risk scenarios.

Sharon Yixuan Li (sharonli@cs.wisc.edu) is an assistant professor in the Department of Computer Sciences at UW-Madison. Her broad research interests are in deep learning and machine learning. She develops algorithms to enable reliable open-world learning, which can function safely and adaptively in the presence of evolving and unpredictable data streams.

Optimal Features for Heterogeneous Matrix Completion (PI: Daniel Pimentel-Alarcon; Co-PIs: Jeff Linderoth and Jim Luedtke)

Head shot of Daniel Pimentel Alarcon Matrix completion, or filling in the unknown entities in a matrix, is one of the most fundamental problems in data science, and existing models are incompatible with some types of data. This project will develop a new model and algorithms specifically tailored to complete matrices with heterogeneous data, with important applications in recommender systems, computer vision systems for processing and analyzing visual images, data inference, and outlier detection.

Daniel Pimentel-Alarcon (pimentelalar@wisc.edu) is an assistant professor in the Department of Biostatistics and Medical Informatics in the UW-Madison School of Medicine and Public Health. His research focuses on robust machine learning methods to identify patterns in big and messy data. He specifically examines robust machine learning of mixtures, and linear and nonlinear structures.

Dependence Modeling for Multi-Risk Insurance Policies (PI: Peng Shi)

Head shot of Peng Shi Insurance products often integrate multivariate risks in design. Examples range from multi-peril policies, multi-coverage contracts, to bundled products. This project is to investigate the benefit of dependence models in product management for insurance products with multivariate risks embedded in their design.

Peng Shi is a professor in the Risk and Insurance Department, and the Charles and Laura Albright Professor in Business and Finance, at the Wisconsin School of Business. His research focuses on actuarial data science, probabilistic forecasting and predictive modeling, insurance and risk analytics, dependence models and multivariate analysis, machine learning and statistical learning, and intensive longitudinal data methods.

Doing More with Linear Transformers (PI: Vikas Singh)

Head shot of Vikas Singh Machine learning and computer vision methods that drive applications such as object detection, image recognition, language understanding, and voice recognition make use of models known as “transformers” that can require weeks or months to train. Deployment of such parameter-heavy models also involves access to specialized hardware resources. This project will significantly extend the capabilities of current models based on algorithmic and implementation improvements. We will focus on ultra-long temporal/spatio-temporal sequences, coming from a broad variety of applications, and study the key challenges that need to be overcome to allow efficient training and deployment of transformer models in these settings.

Vikas Singh (vsingh@biostat.wisc.edu) is a Vilas Distinguished Achievement Professor in the Department of Biostatistics and Medical Informatics at the UW-Madison School of Medicine and Public Health. His group works on algorithm development for image analysis, computer vision, and machine learning problems motivated from applications in biomedical sciences, engineering, and industry.

2021 Awards

Dynamic Workflow Optimization and Planning for Insurance Applications (PI: Laura Albert)

Headshot of Laura Albert

Machine learning tools that recognize patterns or predict claims have the potential to improve service and reduce costs in the insurance industry. This project will use an optimization modeling framework to prescribe innovative, dynamic workflow routing decisions, balance the workload across claims agents, improve client satisfaction, and control costs.

Laura Albert (laura@engr.wisc.edu) is a professor of Industrial and Systems Engineering and a Harvey D. Spangler Faculty Scholar at UW-Madison. Her research interests are in the field of operations research, with a particular focus on discrete optimization with application to homeland security and emergency response problems.

Reducing Bias in Human-AI Conversation (PI: Kaiping Chen; Co-PI: Sharon Li)

Headshot of Kaiping Chen AI models that power intelligent assistants like Google Home and other chatbots may produce responses biased towards dominant groups, while marginalizing the needs of underrepresented populations. This project seeks to mitigate inequality in AI decision-making through reducing unfairness in the algorithms that power chatbot responses.

Kaiping Chen (kchen67@wisc.edu) is an assistant professor of Computational Communication in the Department of Life Sciences Communication and faculty affiliate of the UW-Madison Robert and Jean Holtz Center for Science and Technology Studies, the Center of East Asian Studies, and the African Studies Program. Her research examines how deliberative designs can improve the quality of public discourse on controversial and emerging technologies.

Facilitating Wildfire Insurance Business with Big Data and Machine Learning (PI: Min Chen; Co-PI: Volker Radeloff)

Head shot of Min Chen Recent wildfires across the western US have caused enormous environmental hazards and economic losses. This project will prototype a machine learning framework, modeled on fires in California, that will improve prediction of wildfire probability and severity at daily, weekly, and monthly scales.

Min Chen (mchen392@wisc.edu) is an assistant professor in the Department of Forest and Wildlife Ecology and an affiliate with Nelson Institute Center for Climatic Research, UW-Madison. His research focuses on investigating terrestrial ecosystem carbon, water, and energy dynamics and their interactions with the climate system.

Query Design for Crowdsourced Clustering: Efficiency vs. Noise Trade-off (PI: Ramya Korlakai Vinayak)

Headshot of Ramya Korlakai Vinayak While crowdsourcing is a popular way to collect labeled training data for supervised machine learning, non-expert crowdworkers often provide noisy answers. This project aims to increase understanding of how the ability of humans to learn and retain new concepts affects the quality and cost of crowdsourced data.

Safe and Reliable Machine Learning through Out-of-Distribution Detection (PI: Sharon Li; Co-PI: Jerry Zhu)

Headshot of Sharon Li While machine learning models commonly assume that training and test data distributions must be identical, these models may encounter (and fail to safely handle) anomalous data that differs from the training distribution. This project will tackle this fundamental problem in machine learning, with the goals of automating detection and mitigating unexpected out-of-distribution (OOD) data.

Sharon Yixuan Li (sharonli@cs.wisc.edu) is an assistant professor in the Department of Computer Sciences at UW-Madison. Her broad research interests are in deep learning and machine learning. The goal of her research is to enable transformative algorithms and practices towards reliable open-world learning, which can function safely and adaptively in the presence of evolving and unpredictable data streams.

Developing Novel Mixed Reality Tools for Consumer Insurance Documentation (PI: Kevin Ponto; Co-PI: Ross Tredinnick)

Head shot of Kevin Ponto Documentation and assessment of personal property following accidents or disasters is a major component of the insurance claims process. This project will take advantage of recent technological advances to create mixed reality 3D models and develop an application that allows for a more thorough inspection of damages.

Kevin Ponto (kbponto@wisc.edu) is an associate professor at the Wisconsin Institute for Discovery and the Design Studies Department in the School of Human Ecology. His research aims to develop techniques to better the experience of virtual reality through new devices, interfaces, and techniques.

Lightweight Self-Attention for Detection and Image Classification (PI: Vikas Singh; Co-PI: Zhanpeng Zeng)

Head shot of Vikas Singh Machine learning and computer vision methods that drive applications such as voice recognition make use of models known as “transformers” that can require weeks or months to train. The overarching goal of this project is to enable efficient training of such models for potential use in natural language processing and object recognition for images.

Vikas Singh (vsingh@biostat.wisc.edu) is a Vilas Distinguished Achievement Professor in the Department of Biostatistics and Medical Information at UW-Madison. His group works on algorithm development for image analysis, computer vision, and machine learning problems motivated from applications in biomedical sciences, engineering, and industrial applications.

Data-Aware Model Recycling (PI: Shivaram Venkataraman; Co-PI: Dimitris Papailiopoulos)

Head shot of Shivaram Venkataraman Data scientists spend considerable time training and fine-tuning machine learning models used in applications ranging from risk assessment to recommendation engines. This project will develop tools that can automate and accelerate these processes by intelligently reusing past computations.

Shivaram Venkataraman (shivaram@cs.wisc.edu) is an assistant professor in the Department of Computer Sciences at UW-Madison. His research interests are in designing systems and algorithms for large scale data analysis and machine learning.

Fast Machine Learning with Rich Human-Machine Interactions (PI: Jerry Zhu)

Head shot of Jerry Zhu Enormous sets of labeled data are required to train a good machine learning model, and even methods such as active learning that speed up training require a significant investment in human annotators. The goal of this project is to design a set of novel interactive training methods that are theoretically guaranteed to out-perform active learning.

Jerry Zhu (jerryzhu@cs.wisc.edu) is a professor in the Department of Computer Sciences at UW-Madison. His research interest is in machine learning, particularly machine teaching and adversarial sequential decision making.

Fall 2020 Awards

Machine Learning Approaches for Metadata Standardization (PI: Colin Dewey; Co-PI: Mark Craven)

Principal investigator: Colin Dewey (colin.dewey@wisc.edu), Professor of Biostatistics and Medical Informatics.
Co-Principal Investigator: Mark Craven, Biostatistics and Medical Informatics

Researchers and businesses are increasingly using large data sets, compiled from many sources, for training machine learning systems and performing statistical analyses. A major bottleneck arises from the fact that compiled data sets often contain unstandardized, unstructured metadata that describe each record. Manual standardization of metadata is labor intensive and often requires substantial expertise in the field of study.

To mitigate this issue, this project will develop machine learning approaches for automating the task of metadata standardization in large, heterogeneous data sets. The researchers will use state-of-the-art natural language processing models and develop active learning algorithms, which facilitate identification of records that would most benefit from human expert input. They will demonstrate the performance of these methods on the Sequence Read Archive—a vast repository of public biological sequence data.

Adaptive Operations Research and Data Modeling for Insurance Applications (PI: Michael Ferris)

Principal Investigator: Michael Ferris (ferris@cs.wisc.edu), Professor of Computer Sciences.

Uncertainty abounds in decision problems and optimization is a key tool used to mitigate its effects, utilizing the power of data science. This project will deploy a new approach that separates strategic decision making from operational modeling, in the context of a claim adjustment problem in the insurance industry. In this setting, random accidents occur across a large service area, requiring agents to deploy to the site to assess, document and determine appropriate courses of action. Our approach differentiates normal workload from crisis situations. It will inform an operational model that schedules resources over time to service both routine, normal workloads in a cost-effective manner, and enable the company to react efficiently to crisis situations. The model can be applied to problems as diverse as disaster recovery, chemical spill mitigation and electricity planning for extreme weather events.

A Deep Learning Approach to User Location Privacy Protection (PI: Song Gao, Co-PI: Jerry Zhu)

Principal Investigator: Song Gao (song.gao@wisc.edu), Assistant Professor of Geography.
Co-Principal Investigator: Jerry Zhu, Computer Sciences.

User location information is a key component of both research and business intelligence. With the increasing availability of mobile devices and popularity of mobile apps, users in social network platforms actively share rich information about their locations on the Earth, the places they go and the activities they engage in. Those location-based profiles provide an invaluable source of information. However, mobility data is among the most sensitive data being collected by mobile apps, and users increasingly raise privacy concerns. The proposed research aims to develop a deep learning architecture that will protect users’ location privacy while keeping the capability for location-based business recommendations. The algorithms developed through this research may be applied in usage-based insurance (UBI) and other location intelligence domains.

GAN-mixup: A New Approach to Improve Generalization in Machine Learning (PI: Kangwook Lee; Co-PI: Dimitris Papailiopoulos)

Principal Investigator: Kangwook Lee (kangwook.lee@wisc.edu), Assistant Professor of Electrical and Computer Engineering.
Co-Principal Investigator: Dimitris Papailiopoulos, Electrical and Computer Engineering.

The recent successes of machine learning hinge on the ability of predictive models to generalize, or adapt well to previously unseen data. Data augmentation, the process of injecting artificial data points into a training set, is widely employed for improving generalization. One of the most prominent data augmentation algorithms is mixup, which helps achieve state-of-the-art generalization performance across several benchmark tasks.

While mixup algorithms are useful for improving generalization for a wide class
of tasks, they have a few critical limitations. Mixup sometimes degrades generalization, restricting the applicability of these tasks. Moreover, current mixup algorithms do not have any theoretical performance guarantees. To address these challenges, the researchers will develop a computationally efficient mixup algorithm based on a generative adversarial network (GAN). They will also develop a theoretical framework for analyzing the performance of various mixup algorithms. This research will provide a new approach to improve generalization, with provable performance guarantees.

Integer Programming for Mixture Matrix Completion (PI: Jeff Linderoth; Co-PIs: Jim Luedtke, Daniel Pimentel-Alarcon)

Principal Investigator: Jeff Linderoth (linderoth@wisc.edu), Professor of Industrial and Systems Engineering.
Co-Principal Investigators: Jim Luedtke, Industrial and Systems Engineering; Daniel Pimentel-Alarcon, Biostatistics and Medical Informatics.

Matrix completion, or filling in the unknown entities in a matrix, is one of the most fundamental problems in data science. Matrix completion is used in applications such as recommender systems that predict the rating a user would give to an item, such as a movie or product, and then make recommendations to the user. This project will develop algorithms for solving a mixture matrix completion problem (MMCP), which has important applications not only in recommender systems, but also in computer vision systems for processing and analyzing visual images, data inference, and outlier detection.

Key to this research will be the development and application of advanced algorithmic techniques from integer programming, a powerful mathematical tool for solving optimization problems involving discrete choices. The work will pave the way towards the application of integer programming for a broad class of large-scale data science problems.

Developing a State-of-the-Science Regional Weather Forecasting System (PI: Michael Morgan; Co-PI: Brett Hoover)

Principal Investigator: Michael Morgan (mcmorgan@wisc.edu), Professor of Atmospheric and Oceanic Sciences.
Co-Principal Investigator: Brett Hoover, Space Science and Engineering Center.

This project will develop an ensemble weather prediction system for American Family Insurance that will provide high-resolution weather forecasting run entirely in cloud computing infrastructure. This project will improve the accuracy of forecasting hazardous weather by producing many realizations of the same forecast from slightly varying initial conditions.

The probabilistic forecasts will provide advanced warning of not only hazards including hail, wind gusts, and hurricane impacts in targeted regions, but also the uncertainty associated with the predictability of these hazards. This novel research will provide a state-of-the-science technique in regional weather modeling.

Model Recycling: Accelerating Machine Learning by Re-using Past Computations (PI: Shivaram Venkataraman; Co-PI: Dimitris Papailiopoulos)

Principal Investigator: Shivaram Venkataraman (shivaram@cs.wisc.edu), Assistant Professor of Computer Sciences.
Co-Principal Investigator: Dimitris Papailiopoulos, Electrical and Computer Engineering.

Data scientists train machine learning models that are used in a wide range of domains, from drug discovery to recommendation engines. Training a machine learning model, and fine-tuning the parameters that control how well a model performs, take significant time and resources. The process of incremental fine-tuning is often manual and involves retraining models from scratch. This project will automate and accelerate this process of fine-tuning by reusing and sharing past computations from prior training jobs, using a technique called model recycling. The researchers will develop a software framework that can help data scientists accelerate model fine-tuning, and a proposed intelligent predictor that can automatically save prior computation results, based on their importance.

Question Asking with Differing Knowledge and Goals (PI: Joe Austerweill. Continuation from Spring 2020)

Principal Investigator: Joe Austerweill (austerweil@wisc.edu), Assistant Professor of Psychology

People spend a significant proportion of their time asking each other questions to gather information. Entire professions, such as academia and customer service, are dedicated to asking and answering questions. Despite tremendous progress in machine learning, automated methods that answer a person’s questions are still inferior to answers from people.

Why are people better at answering questions? One reason is that question-askers leave out information that those answering the questions can fill in from their rich knowledge of language and the world. A recent machine learning method addresses this issue by asking multiple, reformulated versions of a human question, providing multiple answers, and learning to select the answer that is most likely to satisfy a person. However, this is done purely from data and does not incorporate psycholinguistic research demonstrating that people prefer simpler answers that are tailored to their personal goals and knowledge.

This project investigates whether incorporating psycholinguistic factors can improve automated question-answering methods. If so, then researchers can test novel, potential psycholinguistic factors and learn more about the underlying mechanisms that enable people to answer questions.

Lightweight Natural Language and Vision Algorithms for Data Analysis (PI: Vikas Singh; Co-PI: Zhanpeng Zeng. Continuation from Spring 2020)

Principal Investigator: Vikas Singh (vsingh@biostat.wisc.edu), Professor of Biostatistics and Medical Informatics

Collaborators: Zhanpeng Zeng (Computer Sciences), Shailesh Acharya and Glenn Fung (American Family Insurance)

Natural language processing is a form of artificial intelligence that helps computers read and understand human language. Efficient and accurate natural language processing models are central to various applications but have a significant computational footprint.
The overarching goal of this project is to accelerate the time it takes to train and test these models by developing alternative solutions that are based on much faster image processing primitives.

Spring 2020 Awards

Question Asking with Differing Knowledge and Goals (PI: Joe Austerweill)

Principal Investigator: Joe Austerweill (austerweil@wisc.edu), Assistant Professor of Psychology

Using Data to Foster Entrepreneurship and Innovation in the Madison Ecosystem (PI: Jon Eckhardt)

Principal Investigator: Jon Eckhardt (jon.eckhardt@wisc.edu), Associate Professor of Business

Collaborators: Brent Goldfarb (U Maryland), Molly Carnes (WISELI)

Entrepreneurship is an important path for upward mobility and wealth creation. Student entrepreneurship matters, in part, because student startups are not necessarily modest endeavors. In 1979, recent UW-Madison graduate Judy Faulkner founded the electronic medical records company Epic, which today employs over 10,000 people. Research indicates that student-entrepreneurship at UW-Madison is surprisingly prevalent.
Despite the impact of student entrepreneurship, little is known about what drives entrepreneurial intentions and activity amongst students, such as an interest in starting a company. Further, female students are less than half as likely as male students to self-report entrepreneurial intentions or actions.

The goal of this project is to support the work of the Academic Entrepreneurship Study Team at UW-Madison. This team is using data analysis techniques to enhance the impact and management of entrepreneurship programs at UW-Madison and other U.S. universities. Insights from this research will support the creation of evidence-based interventions to increase the prevalence and effectiveness of student entrepreneurship.

Machine Learning for Usage-Based Insurance (PI: Robert Holz; Co-PI: Willem Marais)

Principal Investigator: Robert Holz (reholz@ssec.wisc.edu), Senior Scientist, Space Science Engineering Center
Co-PI: Willem Marais (Space Science Engineering Center)
Collaborator: Rebecca Willett (University of Chicago)

Usage Based Insurance (UBI) is a type of vehicle insurance where the costs depend on the user’s type of vehicle, distance travelled, speed and driving behavior. The goals of UBI are to enable insurers to promote safer driving behavior, reduce the frequency and magnitude of auto accidents, and help reduce costs to insurers and drivers.

Data collected for UBI primarily consist of GPS locations collected from smartphones. Additionally, ancillary datasets provide information on speed restrictions, lane information, points of interest and functional road classifications. Together, these data can be used to classify driving behaviors at different risk levels.

This project investigates machine learning methods that analyze very large UBI datasets in order to produce a measure of driver risk and safety. A key technical question of the investigation is how to accurately model UBI data that will allow for an effective and robust measure.

Optimizing Question and Answer Systems via User Feedback (PI: Robert Nowak)

Principal Investigator: Robert Nowak (nowak@engr.wisc.edu), Wisconsin Institute for Discovery and Professor of Electrical and Computer Engineering

Question-and-Answer (Q&A) systems are online software systems that aim to answer questions asked by users. Such systems are increasingly common throughout business, industry and healthcare. This project aims to develop new theory and methods for optimizing Q&A systems based on user feedback.

This project will begin with text embeddings that map words, sentences and whole documents into numerical representations that find similarities and connections in language. The research will draw on recent advances in the field of multi-armed bandit problems—a modeling approach that balances the choice of acquiring new knowledge with the competing choice of relying only on existing knowledge—to explore new approaches for Q&A systems. The research team will develop scalable algorithms for these systems with attention to search optimization and computation time, as human users of Q&A systems will not tolerate large delays in receiving answers to questions.

Improving Traffic Safety Outcomes Through Data Science (PI: David Noyce)

Principal Investigator: David Noyce (danoyce@wisc.edu), Professor and Associate Dean, College of Engineering

While advances during the last 40 years in vehicle design, traffic engineering and driver behavior have led to significant improvements in transportation safety, recent trends have shown a leveling—and in some cases an increase—in the number of traffic crash fatalities. Emerging data provide new opportunities for incentives and technologies that move the trend towards zero fatalities once again. However, there are vital research questions about which technologies hold the most promise and how these different solutions work together to help drivers make informed, safe decisions.

The vision for this research is to translate advances in automotive technology and data science into tools that will improve driver safety and bolster the safety performance of emerging technologies, such as advanced driver assistance systems and automated vehicles. The researchers will conduct collaborative data science research, including machine learning and other approaches, to develop algorithms focused on incentivizing positive driver behavior. Researchers will also quantify the safety performance of emerging technologies, filling information gaps for automated vehicle developers, insurance companies, policy makers and the public.

Learning Causal Relationships from Data (PI: Irene Ong; Co-PI: Aubrey Barnard)

Principal Investigator: Irene Ong (irene.ong@wisc.edu), Assistant Professor of Obstetrics and Gynecology and Biostatistics and Medical Informatics, School of Medicine and Public Health

Co-PI: Aubrey Barnard (Biostatistics and Medical Informatics)

Humans naturally develop an understanding of cause and effect by exploring the world. But causality is not nearly so easy for machines to learn. As a result, causal understanding is often missing from artificially intelligent systems, as you may have noticed when your digital assistant goes awry. To help improve the causal reasoning abilities of such systems, this research project develops an algorithm for learning causal relationships from data, one that is more efficient, accurate and robust than similar algorithms. These characteristics make causal learning more usable and likely to be incorporated into systems like your digital assistant in the future.

For the time being, the causal learning algorithm will be applied to discovering the environmental factors that prevent or cause asthma, and to identify relationships in electronic health data that will help prevent severe drug reactions and improve patient care by tailoring it to each individual patient.

3D Capture and Scanning Technology for Insurance Documentation (PI: Kevin Ponto)

Principal Investigator: Kevin Ponto (kbponto@wisc.edu), Associate Professor, School of Human Ecology

Insurance claims adjusters constantly face the challenge of inspecting and assessing a scene to understand potential risk, or what took place after an event. They typically do this using tools such as digital photography. Recent advances in 3D capture technologies have created new ways to digitize the world around us. The overall goal of this project is to design and implement a system that utilizes 3D scanning and capture technology for automated documentation of scenes. This has the potential to reduce disputes between insurance companies and their clients, saving money and time for both parties.

As the utilization of 3D capture technology in this area is quite novel, and upcoming technological changes may create new directions of inquiry, the project will focus on research and design of an automated inventory system. This work will provide foundational knowledge for how 3D capture technologies may benefit the insurance industry.

Lightweight Natural Language and Vision Algorithms for Data Analysis (PI: Vikas Singh; Co-PI: Zhanpeng Zeng)

Principal Investigator: Vikas Singh (vsingh@biostat.wisc.edu), Professor of Biostatistics and Medical Informatics

Collaborators: Zhanpeng Zeng (Computer Sciences), Shailesh Acharya and Glenn Fung (American Family Insurance)

Ultra-Fast Training for the Third Wave of Artificial Intelligence: Novel Categories in Text Classification (PI: Jerry Zhu)

Principal Investigator: Jerry Zhu (jerryzhu@cs.wisc.edu), Professor of Computer Sciences

The first wave of artificial intelligence (AI) emerged in the 1980s as expert systems that apply rules to deduce new facts. In the 2000s, the second wave of AI emerged as statistical machine learning, including deep learning. Second-wave AI networks are trained on enormous data sets labeled to recognize patterns. The yet-to-come third wave of AI is predicted to combine and supersede the first two waves. Third-wave AI systems will require far fewer data items for training, and will apply rules in ways that are more similar to human cognition.

This project aims to take a step toward the third wave of AI by allowing data scientists to train a classifier (the “brain of AI”) using intuitive data transformation rules. This contrasts second-wave AI, where the data scientists must label training data. We expect that providing rules instead of labels will achieve faster, better training. This project will focus on text classification used by businesses, with the aim of producing more agile text classifiers with fewer human resources.

Co-PI: Ara Vartanian (Computer Sciences)