Large Language Models (LLMs) like ChatGPT have exhibited remarkable capabilities in understanding and generating language across diverse disciplines. In the realm of biomedical data science and computational biology, LLMs can significantly aid the processes of information accessibility, data analysis, and knowledge discovery. In this tutorial, we offer an introductory level hands-on guide to understanding and utilizing these LLMs in the field of biomedical data science. Our tutorial begins with leveling the learning ground by providing introductions to LLMs and Biomedical Data Science. Subsequently, we delve into the core applications of LLMs in biomedical data science/computational biology via retrieval-augmented generation, database functionalities, and code generation. To facilitate thought-provoking discussions, pertinent case studies will be discussed, emphasizing how to harness the power of LLMs to bridge the gap between technical feasibility and practical utility in biomedical data science. Furthermore, hands-on exercises are included to enable participants to apply their learning in real-time. Participants will also get acquainted with OpenAI's ChatGPT and open-source LLMs, as well as their design, use cases, limitations, and prospects.
View on ISMB 2024: https://www.iscb.org/ismb2024/programme-schedule/tutorials#vt1
Tutorial registration: https://www.iscb.org/ismb2024/register#tutorials
Robert Xiangru Tang
Yale University |
Qiao Jin
NCBI/NLM/NIH |
Hufeng Zhou
Harvard University |
Shubo Tian
NCBI/NLM/NIH |
Zhiyong Lu
NCBI/NLM/NIH |
Mark Gerstein
Yale University |
Time in EDT (Montreal, Quebec, Canada Local Time)
Time | Section | Presenter |
---|---|---|
Part 1 (Monday, July 8, 2024) | ||
14:00 - 14:10 | Overview and Welcome | Robert Xiangru Tang |
14:10 - 14:40 | Introduction to LLMs with a Focus on Biomedical Data Science | Shubo Tian |
14:40 - 15:10 | How to Use GPT-3.5 and GPT-4 with Python | Qiao Jin |
15:10 - 15:30 | How to Use Open-source LLMs with Python | Robert Xiangru Tang |
15:30 - 15:45 | Coffee Break | |
15:45 - 16:10 | Database Query Generation with LLMs | Hufeng Zhou |
16:10 - 16:35 | Retrieval-Augmented Generation with Large Language Models | Qiao Jin |
16:35 - 17:00 | Code Generation in Bioinformatics | Robert Xiangru Tang |
Part 2 (Tuesday, July 9, 2024) | ||
14:00 - 14:45 | Large Language Models for Biomedicine: from PubMed Search to Gene Set Analysis | Zhiyong Lu |
14:45 - 15:30 | AI in Biomedicine: Developing Representations of Disease-Relevant Molecules | Mark Gerstein |
15:30 - 15:45 | Coffee Break | |
15:45 - 16:10 | Integrating Biomedical Data Database Development with LLMs | Hufeng Zhou |
16:10 - 16:35 | Querying PubMed with RAG to Answer Biomedical Questions with GPT-4 | Qiao Jin |
16:35 - 16:55 | Code Generation in Bioinformatics with Open-source LLMs | Robert Xiangru Tang |
16:55 - 17:00 | Closing Remarks | Robert Xiangru Tang |
*Slides may be subject to updates, and full paper lists will be up soon.
*We will take Q&A through Rocket.Chat and ZOOM. |