Application of Large Language Models in Biotechnology and Pharmaceutical Research
ProGen
Progen is a deep-learning LLM capable of generating protein sequences with a predictable function across large protein families. ProGen was trained on 280M protein sequences from more than 19,000 families, and the model is augmented with control tags specifying the property of the protein. ProGen can be fine-tuned to create more accurate protein sequences using specific sequences and tags.
ChemCrow
Although LLMs have shown great performance in tasks across various domains, they often struggle with chemistry-related problems. Additionally, these models do not have access to external sources, which limits their usefulness in scientific research. ChemCrow is an LLM chemistry agent that aims to solve this issue. The model is designed to accomplish tasks across drug discovery, organic synthesis, and materials design.