Introducing Jasmine Baker: bioinformatician and LEARN mentor
Jasmine Baker has been working in the field of genomics and bioinformatics, doing translational and clinical work for approximately 8 years. Her journey into this space began during her PhD at Louisiana State University (LA, USA), where she was fascinated by the extensive insights one could gain from sequencing data and the computational pipelines to streamline data analysis.
Jasmine recently joined BioTechniques’ LEARN as a mentor, providing expertise in the computational biology field. In this interview, we get to know her, the bioinformatics work she engages in and the resources she plans to contribute to LEARN.
What inspired you to pursue a career in bioinformatics?
I find it fascinating how much information can be pulled from the data that we carry with us in our bodies every day and how this can be used to help others. I realized that computational tools are the keys to unlocking this data. The realization that computational analyses were going to transform our understanding of human health propelled me to dive deeper into bioinformatics and stay in the space of genomics and human studies. Bioinformatics offered me a bridge between the traditional theories and concepts you learn about in biology and the data analysis work that is essential for making translational strides and contributing to our understanding of human health.
What is your approach to developing computational scripts for next-generation sequencing data analysis?
Depending on what your end goal is, there are a variety of source codes and packages that you need to learn to analyze DNA sequencing data. However, it also depends on the data you are using; there are different packages you need to learn if you’re trying to analyze RNA sequencing data, for instance. Most importantly, because it is at the core of all this work, one needs to learn how to program in Python and R. Only then can you use these languages to develop scripts that pull the raw data and create pipelines.
At this point, you can automate, developing systems that can filter and do quality control to get the data ready for analysis. A lot of the scripts that I wrote cleaned up data, filtered it and assembled genomes. However, many scripts also revolved around lacing different software together, allowing me to analyze massive datasets without having to do it manually. Bioinformatics involves being able to use statistics as well as feeling comfortable diving into the theory behind molecular evolution and how the genome is changing. So, there are multiple aspects to consider when developing computational pipelines.
What challenges currently exist in the field of bioinformatics for sequencing?
I think one of the biggest challenges is staying abreast of the constant advancements in sequencing technology. There is so much data being produced, and people are constantly trying to find ways to analyze it. Now we are getting to the point where we have so much data that we need interdisciplinary projects and scientists in various fields involved in the analysis, which is a good problem to have.
Another challenge is being able to ensure accuracy and reproducibility. In the past, software was rarely open source, meaning you would have to request access, or you simply couldn’t use their code. The good news is that many researchers now put their pipelines and software on GitHub, so the code is constantly evolving and improving thanks to contributions from other scientists. However, the challenge remains as not everyone has made that transition to open source.
How are collaboration and mentoring influencing computational biology at present?
Collaboration and mentoring are driving the field. As I mentioned, there are complex questions and a great deal of data to work with, which requires expertise from many different fields. Collaboration allows us to ask more focused questions and interpret the results from a biological, statistical and clinical standpoint, so we really get a narrative in our data from bench to bedside.
The mentoring aspect is crucial for the next generation of bioinformaticians. Mentoring provides guidance and fosters a supportive learning environment; this way people can home in on the things that they need to learn and keep abreast of the overwhelming amount of recent technology and data.
What kind of content are you planning on sharing with our audience as a LEARN mentor?
I am excited to share content about genomics, the fundamentals of bioinformatics and core concepts and tools used for data analysis. I hope that the topics I discuss help someone navigate and use the bioinformatics tools that are out there.
I also want to showcase some of the real-world applications of bioinformatics and demonstrate how it can help solve important problems in medicine, agriculture and evolution, among other areas. Hopefully in the future I can develop interactive exercises for the BioTechniques audience that foster open discussion; I want learners to feel more comfortable with approaching biological data and problems from the computational biology perspective.