ISMB 2024 Tutorial
A Practical Introduction to Large Language Models in Biomedical Data Science Research
July 2024, Virtual on Zoom
About This Tutorial

Large Language Models (LLMs) like ChatGPT have exhibited remarkable capabilities in understanding and generating language across diverse disciplines. In the realm of biomedical data science and computational biology, LLMs can significantly aid the processes of information accessibility, data analysis, and knowledge discovery. In this tutorial, we offer an introductory level hands-on guide to understanding and utilizing these LLMs in the field of biomedical data science. Our tutorial begins with leveling the learning ground by providing introductions to LLMs and Biomedical Data Science. Subsequently, we delve into the core applications of LLMs in biomedical data science/computational biology via retrieval-augmented generation, database functionalities, and code generation. To facilitate thought-provoking discussions, pertinent case studies will be discussed, emphasizing how to harness the power of LLMs to bridge the gap between technical feasibility and practical utility in biomedical data science. Furthermore, hands-on exercises are included to enable participants to apply their learning in real-time. Participants will also get acquainted with OpenAI's ChatGPT and open-source LLMs, as well as their design, use cases, limitations, and prospects.

View on ISMB 2024: https://www.iscb.org/ismb2024/programme-schedule/tutorials#vt1

Tutorial registration: https://www.iscb.org/ismb2024/register#tutorials

Organizers
Robert Xiangru Tang
Yale University
Qiao Jin
NCBI/NLM/NIH
Hufeng Zhou
Harvard University
Shubo Tian
NCBI/NLM/NIH
Zhiyong Lu
NCBI/NLM/NIH
Mark Gerstein
Yale University
Schedule

Time in EDT (Montreal, Quebec, Canada Local Time)

Time Section Presenter
Part 1 (Monday, July 8, 2024)
14:00 - 14:10 Overview and Welcome Robert Xiangru Tang
14:10 - 14:40 Introduction to LLMs with a Focus on Biomedical Data Science Shubo Tian
14:40 - 15:10 How to Use GPT-3.5 and GPT-4 with Python Qiao Jin
15:10 - 15:30 How to Use Open-source LLMs with Python Robert Xiangru Tang
15:30 - 15:45 Coffee Break
15:45 - 16:10 Database Query Generation with LLMs Hufeng Zhou
16:10 - 16:35 Retrieval-Augmented Generation with Large Language Models Qiao Jin
16:35 - 17:00 Code Generation in Bioinformatics Robert Xiangru Tang
Part 2 (Tuesday, July 9, 2024)
14:00 - 14:45 Large Language Models for Biomedicine: from PubMed Search to Gene Set Analysis Zhiyong Lu
14:45 - 15:30 AI in Biomedicine: Developing Representations of Disease-Relevant Molecules Mark Gerstein
15:30 - 15:45 Coffee Break
15:45 - 16:10 Integrating Biomedical Data Database Development with LLMs Hufeng Zhou
16:10 - 16:35 Querying PubMed with RAG to Answer Biomedical Questions with GPT-4 Qiao Jin
16:35 - 16:55 Code Generation in Bioinformatics with Open-source LLMs Robert Xiangru Tang
16:55 - 17:00 Closing Remarks Robert Xiangru Tang
*Slides may be subject to updates, and full paper lists will be up soon.
*We will take Q&A through Rocket.Chat and ZOOM.