Project collaborators
Kyle Cranmer and Jason Lo, Data Science Institute
Project start and end dates
7/1/2023 – 9/30/2023
Project summary
We created an automated pipeline to continuously update a website dedicated to the emerging field of simulation-based inference (SBI). SBI is a powerful technique that enables principled statistical inference in a diverse range of disciplines including astrophysics, evolutionary biology, systems biology, neuroscience, economics, and particle physics. By automatically curating information from various sources and leveraging large language models (LLMs) for categorization, it provides a single entry point to access research papers, software, and related publications. This makes it a valuable hub for advancing SBI research. Additionally, the platform promotes cross-disciplinary collaboration, fostering the exchange of ideas and accelerating scientific progress.
The simulation-based-inference.org website is powered by an automated collection and curation system that integrates several API endpoints with a minimalist backend. The process begins with SERPAPI, which collects new academic publications from databases like Google Scholar, ensuring the site remains up-to-date. Once collected, additional metadata—such as paper categories—is fetched from arXiv and bioRxiv APIs. If a category is unavailable from arXiv/bioRxiv, the title and abstract are sent to OpenAI for classification. This enriched metadata enhances searchability and organization. All data is stored in a centralized YAML file, chosen for its simplicity and seamless integration with static site generators, eliminating the need for complex databases.
Jekyll, a static site generator, converts the YAML data into HTML, and the site is deployed using GitHub Pages. The entire process is automated with GitHub Actions, which run weekly to fetch new publications, update metadata, and regenerate the site content—keeping the website current with minimal human intervention.
For manual contributions, the website uses GitHub Issues to allow the public to submit content such as tools, papers, job postings, general feedback, or corrections.
In summary, this workflow efficiently updates the website with the latest research in simulation-based inference through a streamlined, automated pipeline, with options for community contributions.