ISMB 2024 Tutorial
A Practical Introduction to Large Language Models in Biomedical Data Science Research
July 2024, Virtual
About This Tutorial

Part 1: Monday, July 8, 2024 14:00 – 18:00 EDT

Part 2: Tuesday, July 9, 2024 14:00 – 18:00 EDT

Large Language Models (LLMs) like ChatGPT have exhibited remarkable capabilities in understanding and generating language across diverse disciplines. In the realm of biomedical data science and computational biology, LLMs can significantly aid the processes of information accessibility, data analysis, and knowledge discovery. In this tutorial, we offer an introductory level hands-on guide to understanding and utilizing these LLMs in the field of biomedical data science. Our tutorial begins with leveling the learning ground by providing introductions to LLMs and Biomedical Data Science. Subsequently, we delve into the core applications of LLMs in biomedical data science/computational biology via retrieval-augmented generation, database functionalities, and code generation. To facilitate thought-provoking discussions, pertinent case studies will be discussed, emphasizing how to harness the power of LLMs to bridge the gap between technical feasibility and practical utility in biomedical data science. Furthermore, hands-on exercises are included to enable participants to apply their learning in real-time. Participants will also get acquainted with OpenAI's ChatGPT and open-source LLMs, as well as their design, use cases, limitations, and prospects.

View on ISMB 2024: https://www.iscb.org/ismb2024/programme-schedule/tutorials#vt1

Tutorial registration: https://www.iscb.org/ismb2024/register#tutorials

Room link: https://iscb.junolive.co/ISMB24/live/breakouts/ismb2024_tutorialvt1-1 and https://iscb.junolive.co/ISMB24/live/breakouts/ismb2024_tutorialvt1-2

Organizers
Robert Xiangru Tang
Yale University
Qiao Jin
NCBI/NLM/NIH
Hufeng Zhou
Harvard University

Shubo Tian
NCBI/NLM/NIH
Zhiyong Lu
NCBI/NLM/NIH
Mark Gerstein
Yale University
Schedule

Time in EDT (Montreal, Quebec, Canada Local Time)

Time Section Presenter
Part 1 (Monday, July 8, 2024)
14:00 - 14:10 Overview and Welcome [Slides] Robert Tang
14:10 - 14:40 Introduction to LLMs with a Focus on Biomedical Data Science [Slides] Shubo Tian
14:40 - 15:10 How to Use GPT-3.5 and GPT-4 with Python [Slides] Qiao Jin
15:10 - 15:30 How to Use Open-source LLMs with Python [Slides] Robert Tang
15:30 - 15:45 Coffee Break
15:45 - 16:10 Code Generation in Bioinformatics [Slides] Robert Tang
16:10 - 16:35 Retrieval-Augmented Generation with Large Language Models [Slides] Qiao Jin
16:35 - 17:00 Querying PubMed with RAG to Answer Biomedical Questions with GPT-4 [Slides] Qiao Jin
Part 2 (Tuesday, July 9, 2024)
14:00 - 14:45 Large Language Models for Biomedicine: from PubMed Search to Gene Set Analysis [Slides] Zhiyong Lu
14:45 - 15:30 Developing Computational Representations of Disease-Relevant Molecules: 3 Cases Studies for AI in Biomedicine [Slides] Mark Gerstein
15:30 - 15:45 Coffee Break
15:45 - 16:10 Integrating Biomedical Data Database Development with LLMs [Slides] Hufeng Zhou
16:10 - 16:35 Database Query Generation with LLMs [Slides] Hufeng Zhou
16:35 Closing Remarks Robert Tang
*Participants should join our tutorial via Junolive.