Amassing and Standardizing Genomic Data for Useful Comparisons in Data-Driven Medicine—Kevin Puylaert—SOPHiA GENETICS

April 26, 2019

Amassing and Standardizing Genomic Data for Useful Comparisons in Data-Driven Medicine—Kevin Puylaert—SOPHiA GENETICS

SUBSCRIBE & REVIEW ON

SHARE THIS PODCAST

Approximately 15,000 new genomic profiles are analyzed on the SOPHiA GENETICS platform every month, and nearly 1,000 institutions are sharing and comparing this information for the ultimate goal of better understanding the relationship between genes and cancer and rare disorders, coming up with prognostic information, and identifying the best treatments in light of a patient’s genetic profile. General manager of North American operations and VP of business development at SOPHiA GENETICS, Kevin Puylaert, offers a glimpse into a world where massive amounts of data are being collected and analyzed on a regular basis for the betterment of data-driven medicine. He also discusses the details of the work they’re doing and what’s on the horizon in the coming months and years.

Press play for all the details and check out https://www.sophiagenetics.com/en_US/home.html to learn more.

Podcast Transcript

Richard Jacobs: Hello! This is Richard Jacobs with the Future Tech and Future Tech Health podcast and I have Kevin Puylaert, he is the general manager of North America and VP Business Development at Sophia genetics. That’s spelled as SOPHIA Genetics. Kevin, thanks for coming. How are you doing today?

Kevin Puylaert: Pleasure. Very nice. How are you, Richard?

Richard Jacobs: Yeah good. Yeah, when I hear the name SOPHIA, I think of that AI robots, but I guess this Sophia genetics is different, right?

Kevin Puylaert: It is. SOPHIA, actually originally Sophia in ancient Greek used to mean wisdom. So that’s where the name really comes from.

Richard Jacobs: Oh okay. So what’s the plan of Sophia genetics? What do you guys do there?

Kevin Puylaert: Sophia is offering software as a service platform to democratize data-driven medicine. So, we have a few applications in a few different areas and let’s say our flagship area is in genomics. Today we’ve connected a bit more than 900 institutions around the world, nearly a thousand institutions, that are sharing their genomic profiles across the platform. That means that an institution here in Boston, he’s able to see when they’re analyzing a profile, how their profile compares to a profile that might have been seen in institutions in Gonzales or in Austin or in Paris. So we’re really building a software as a service platform to be able to access global intelligence.

Richard Jacobs: Oh, that’s great because I know various companies are trying to get a critical mass of, you know, let’s say 100,000 people sequenced or 50,000. So you connecting all these platforms. What does that mean? How many sequences are available in the total pool?

Kevin Puylaert: Yeah. Today we’ve looked at about a bit more than 300,000 profiles. On a monthly basis, we see about 15,000 profiles that are being analyzed through the platform. This is probably one of the largest consortia of institutions, looking at patients at the moment, more patients are being diagnosed on monthly basis and players like that foundation medicine are quite well known in this field as well.

Richard Jacobs: In a platform, again, it’s able to pull data from many separate platforms. But what kind of analytical tools are laying on top of that? That it is unique to you or is your role just the aggregation or what else do you do?

Kevin Puylaert: Yeah, It’s a good question. When you want to leverage on the eye, it’s very important to have very clean data and very structured data. So when you look into include genomic information, very often there’s a lot of noise into that data. So whenever we work with a new institution, a new lab that is doing clinical research, we’ll go through a set of program with that lab that is quite extensive, to establish the performance of that particular lab and we’ll tailor our pipelines to the actual data that is generated by that platform, by that institution based on reagents that they might be using or diseases that they might be looking into. So we standardize the way the data is generated across those different labs, across those different institutions and that allows us to share that data. It allows people to look at data coming from a variety of places and being very comparable. And only because of that we can, we can build tools on that data and so. One of the tools that we build is our AI is seeing how people are interacting with that data. You see, okay, well certain volumes are classified as benign, some volumes are classified as pathogenic and from there it is learning how to pre-classify similar type of volumes. So, it will suggest, okay, well maybe those volumes are more likely to be pathogenic and those volumes are more likely to be benign. It makes it very easy for the user to test additional profiles.

Richard Jacobs: Are you just looking at the human genome or you looking at viruses or other creatures?

Kevin Puylaert: Yeah. We look only at humans at the moment and really focused on the clinical research space.

Richard Jacobs: Yeah because if you’re able to establish a data standardization and you’re aggregating it from many sources, some other company or some other effort could take that and do the same thing for viruses or for plants or for whatever else they want to catalog and you’d have all the infrastructure set up for them to do that. Probably much better than they are right now.

Kevin Puylaert: Yeah, no that’s true. I think what is essential when we’re looking at, do you remember the particular today we do many, many cancer samples. You want to be able to find an answer for each and every profile. Whereas when you were looking at plants or when you’re doing research, coming to a solution is not as essential. You might be able to look with another plant or you know, there’s the stakes are not as high as when you are looking at a cancer sample and you want to make sure that you’re finding the causative mutations and as such, to be able to recommend certain treatments for that patient.

Richard Jacobs: What are some interesting correlations or findings that you’re seeing in the data because you have access to so much of it.

Kevin Puylaert: So one of the interesting things we saw when we were observing that particular, disease, for example, in hereditary cancer, we realized that after we had seen 30,000 samples our REI was 98% as good as a human doing the pre-testification. It means that it’s saving people a lot of time for the routine cases and letting them focus more on difficult cases.

Richard Jacobs: Okay. So 98% in determining what?

Kevin Puylaert: In determining that a particular sequence would be pathogenic in a particular sample.

Richard Jacobs: Oh, okay. So you’re looking just at the genome or you’re looking at DNA or RNA or are you looking at proteins or molecules. What are you looking at?

Kevin Puylaert: Yeah, so we’re really there to offer tools for whatever labs are using at the moment for whatever it is established in the field and so what we’re seeing is that DNA is very prevalent. So there’s the amount of DNA testing, RNA starting, we see quite a few labs starting to look at choosing inversions with RNA. And so we’ve mostly been focusing on those two applications.

Richard Jacobs: Okay. All right, so you’re able to tell again if a certain gene is associated with cancer or an oncogene I guess and what kind of insights with the RNA any other insights you’re getting?

Kevin Puylaert: Yeah, so I think in cancer, you can look at Mandalian disorders, hereditary disorders or, oncology. In oncology, we were going to associate certain treatments. We will be able to associate certain clinical trials who will be able to get certain, some prognostic information based on the variety of mutations. It will find in the profiles even from DNA or from RNA. If we, if we look into Mendelian disorders, there we will be able to see, okay, well this is the causative mutation. This is most likely the disease that the patient is suffering from based on the genetic profiling that was done for that particular patient.

Richard Jacobs: And what about taking it further though? I mean, does this tell you I’ll give you any insights into how to assist with drug discovery or how to find the immunotherapies or other types of therapies for cancer?

Kevin Puylaert: So yeah, definitely therapies it will recommend. Today we haven’t done much with respect to drug discovery, it’s been mostly used really in the clinical research space that is mostly trying to diagnose certain samples immediately right. Rather than doing retrospective studies.

Richard Jacobs: So what are some of the goals for the next year or two? Do you want to get to a million genomes on file or what’s the big goal?

Kevin Puylaert: Yeah, so as I mentioned our flagship area it’s genomics, but we started about six months ago, we started to invest significantly in value omics where we do the analysis or radiology images and we invest significantly in what we call trial matching. And so all of this is in line with our long-term vision of building data-driven medicine and finding as many sources of data as possible about that characterize particular patients. So that you can say, okay well this profile looks like so many other profiles we’ve seen around the world. If we make it more, more tangible, this profile looks like 50,000 other profiles we have seen that are very similar or cluster. And out of that, that clustered 30% has had treatment, combination A and B, we’ve outcome X, 40% of that treated combination CNV we’ve outcome why? And so the oncologist has a lot more data in the hands to be able to give a very precise recommendation with outcomes.

Richard Jacobs: Any surprises about the prevalence of certain cancers or other conditions?

Kevin Puylaert: I think at this stage of development we will reflect quite well, what is the prevalence of different cancer types around the world. So no particular surprises.

Richard Jacobs: Oh so the prevalence of certain conditions, it doesn’t seem to follow any pattern by country or by age, gender or have you seen those skews and any of the conditions you’re looking at?

Kevin Puylaert: To be honest, I don’t know if we’re quite sensitive to the way we use the data that is being analyzed through the platform. So we’re originally a Swiss company based in northern, we’ve now headquarters in Boston as well. And so the data privacy is very important to us. So we, we can only use the data has been shared for the platform for certain very well defined goals. And so we haven’t done that many studies of that kind.

Richard Jacobs: Okay. So again, what is the goal over the next year or two where you want to take things? So we’re seeing a phenomenal growth at the moment, we’re now seeing about 15,000 patients a month and we’re seeing that number grow at least 50% every six months. So today we’re really trying to onboard as many institutions as possible to participate in this effort. Because if you look at Mandalian disorders to diseases that we’re studying are very rare and so you need a very large set of profiles that are being shared across the community. If you want to infer something smart for the particular profile you’re looking at at the moment. So today we’re really in a phase where we’re on-boarding institutions and making the community grow.

Richard Jacobs: Any learning that you’re getting about the need to standardize data? When you’re talking to companies that actually do the collection, of the DNA and say hey, you know, don’t forget to do this or to test that or add this in so that the standardization is easier or more complete.

Kevin Puylaert: Yeah, very good question because it’s something that is really underestimated in, this industry, but I feel in most industries that are speaking about, about the data and about data collection. People don’t realize how important it is to have standard and very clean data to be shared. So I think the way we’ve grown is by being exposed to a variety of institutions that we’re all giving us or had their own experience. And so we shared that experience from all our users across the community in terms of how to generate the cleaners data. And today during our set up program, this is one of the essential parts of what we do is in making recommendations to our users into how they can tweak their lab to get the optimal data if speak about genomics. But in a similar way about, in radiomics where different users are looking at the same images from the same patients over time and not always structuring the data in similar ways. So it’s very important to have tools that structure the data that makes sure that they extract the tumor in a similar way for that same patient over time, no matter who is looking at the image so that, that data can then be compared and so that you can have some intelligent findings on those images.

Richard Jacobs: I think as it relates to the images, I mean I have spoken to a few companies that are increasing the quality of the images, the resolution, etc. You keep tabs on, you know, you’re getting data, but the quality and the nature of that data may change over time and maybe if you communicate with some or all the data providers, you’ll find ones out there that are advancing in certain ways. Maybe there’s a new compression algorithm for it or again, in a higher resolution for an MRI or something.

Kevin Puylaert: Yeah. So in genomics, which is a field where we’ve been known for a few years, we have very close discussions with some of the large reagent manufacturers. And we have partnerships with many of them where we’re feeding back some of that information. Some of the knowledge we gained in standardizing the data across so many labs. In radiology, we’re having those discussions with large imaging providers. We try boarding genomics and proteomics to start from the raw data. So to not have too many layers between us and the actual instruments, but we haven’t. The fields are moving at a different speed in terms of the machines that are generating the data between genomics and proteomics. So we haven’t had that much influence in the radiology. So yet in making sure the MRIs or CT scans are evolving in ways that would make it easier to standardize the data.

Richard Jacobs: Yeah, I just figured if anyone’s going to see at first it would probably be you. As you are connected to all these places that are doing this.

Kevin Puylaert: Yeah, you’re right. It’s something we should be exploring in radiology.

Richard Jacobs: Okay. Well very good. So what’s the best way for people to learn more about Sophia genetics?

Kevin Puylaert: So we have a nice website that is sophiagenetics.com where you’ll find a lot of information and from there you can ask for us to reach out as well for chat or we’re in most conferences around genomics or oncology put nationally but also internationally, we now have users I think in over 70 countries, so we have got a presence that reflects that as well.

Richard Jacobs: Okay. Well very good. Well, Kevin, thanks for coming on the podcast. I really appreciate it.

Kevin Puylaert: My pleasure Richard! Have a good afternoon.