Implementing AI: The Effects of AI in Drug Development

Join our distinguished Novartis speaker, Shruthi Bharadwaj, PhD as she speaks about her experience in AI including:

With the costs to commercialize a new drug is dramatically increasing to over $2 billion, AI brings the opportunity to assist pharma companies to bring new drugs and therapies to the market faster.  However, the evolution of AI will bring noticeable changes to the future of development and has the potential of accelerating and streamlining research.  This technology is still new and not without its risks and challenges.

Transcript:

Brian Neman:

Thank you everyone for coming. My name is Brian Neman, I’m the founder and CEO of Sanguine and I thank you again for coming. I wanted to quickly introduce the company before we introduce our speaker today Shruthi.

Brian Neman:

A little bit about Sanguine, I started the company about 10 years ago with the idea of working directly with patients. By working directly with patients, we found that it was easier to collect samples and specimen information and the actual specimen itself as well as medical record information and other patient reported outcomes. Because of that direct to patient relationship, it’s yielded for researchers like yourselves higher data coverage per patient, longitudinal information, as well as faster sample and data acquisition. Again, because of the right relationship and the strength of the relationships that we have.

Brian Neman:

The best way to think about us is that we are a platform connecting both researchers and patients to accelerate the research process, through a better data and more data coverage per individual. I’m happy to be here. Historically over the past 10 years we have completed over, I think now over 600 research studies across autoimmune conditions, oncology and rare diseases. We’re happy to introduce Shruthi Bhardawaj?

Shruthi S. Bharadwaj:

Bharadwaj.

Brian Neman:

Bharadwaj excuse me today. Her experience is in IBD and understanding the onset of colon cancer. Sanguine historically as I mentioned has been involved in specimen collection, but now as we’re getting more into medical record data collection, we’re seeing the use in cure, in patient forward outcomes. We’ve also seen that use in understanding claims and pricing strategy and now in artificial intelligence and machine learning. With that we’re entering that part of the market and seeing value of medical record data in that regard.

Brian Neman:

I would like to introduce Shruthi, as I mentioned she’s focusing on IBD in colon cancer. She’s a senior scientist at Novartis in AI in drug development and she has a patent in machine learning in her approach to as I just mentioned colon cancer and IBD. She’s won several NIH grants and has published several abstracts in a book chapter, so we’re really excited to have Shruthi here to talk about her work and how she’s making an impact in IBD and colon cancer. Without further ado Shruthi.

Shruthi S. Bharadwaj:

Thank you, thank you so much. Okay, before I begin, just a quick show of hands, how many are new to AI, ML? Okay, a few. How many are already working on AI and machine learning concepts? Okay, for those of you this might be a little not too exciting, but please stop me for people who are new to this. Stop me, ask questions, I think that’s how we can get this forward. Like I was telling Raksha a couple of minutes ago, stop me whenever you have any questions.

Shruthi S. Bharadwaj:

Like the introduction said, I am working at Novartis. I work with a small team which focuses on AI and machine learning and using these approaches to clinical trial data analysis.

Shruthi S. Bharadwaj:

Now, before I start, what exactly is healthcare data? Now again, I’m not completely sure about how many of you are already working with healthcare data, how much, how little, so I’ll just start with the most basic thing. From my experience healthcare data combines a lot of different things. It’s not just genomic or proteomic, but it’s a combination of everything. EMR is a big one these days and you need to have a complete picture of the patient before you try to analyze anything. That is what we call precision medicine. That is where you have the EMR data, imaging data.

Shruthi S. Bharadwaj:

The EDC essentially is, these days you have a lot of digitization happening. You have these laptops, you have apps that capture information. All these things combined gives us a picture of what the patient looks like. That is the first stage in analyzing any kind of data, getting the data.

Shruthi S. Bharadwaj:

Now, the reason AI and machine learning has gained so much importance is that now we have too much data and we don’t know what to do with it. Now you can only do so much with certain databases or Excel. There was a time when people used to use Excel to do data analysis. These days not possible. Here’s a quick view of how much the healthcare data has grown over the years.

Shruthi S. Bharadwaj:

Now this was just 2013, which is basically yesterday, but since 2013 to as of now you can see the growth, the immense growth of healthcare data. Like I said, this healthcare data combines a lot of things because again we need the bigger picture to do any sort of analysis for drug discovery or any healthcare related analysis.

Shruthi S. Bharadwaj:

Now you might also, if you’re new to OMICs data, you might have heard … These are all buzz words. What essentially is OMIC data? OMIC comprises of these three things, genomic, genetic of. What is the genetic makeup of a patient, of a human being? That is genomin. Proteomin, protein, protein interactions, how do these happen? These are some of the data points that we need to collect in order to get this bigger picture that I’m referring to.

Shruthi S. Bharadwaj:

Metabolomics also not entirely there but it is absolutely essential to get that bigger picture of a patient and then use that information to treat the patient. Now all this combined is what we have as OMICs data and which is also just a tiny, tiny, tiny part of this entire bigger healthcare clinical data.

Shruthi S. Bharadwaj:

Now the sensor data is actually quite fascinating. Couple of years ago before Novartis I used to work for Thomson Reuters. I don’t know if you are aware of this, Thomson Reuters also used to have this science wing which was essentially working on utilizing sensor data. That was new back then, this was in 2013.

Shruthi S. Bharadwaj:

Now what is sensor data? The watch that I have right now this is collecting a lot of information. It knows my heart rate, it knows how many steps I have. It knows when I stand up, it knows when I’m not breathing, so all that information is being collected somewhere and someone is using it. Now there are also a lot of issues with data collection, some people like it and some people don’t. Now for healthcare reasons it is important that we collect this data and with consent use the data for the right purpose.

Shruthi S. Bharadwaj:

When I say sensor data here I’m referring to the correct use of this healthcare data. My heart rate, I want somebody to look at my heart and say maybe I’m I prone to having cardiovascular disease maybe 10 years from now? It’s absolutely possible to predict something like that using certain variables, certain parameters. This basically gives that sort of information for a patient.

Shruthi S. Bharadwaj:

EMR data, now I know you must have heard about EMR data. This is a jungle, because when I was introduced to EMR data the first time, I didn’t know how to understand it. Every time we go to a physician, every time for a general yearly, annual checkup certain things are collected. Now these things you may think are just, okay, what medication, what’s your height, what’s your weight? They check your vitals, but all of this information when you look at it from a data scientist perspective, this is humongous.

Shruthi S. Bharadwaj:

As a patient I may not realize how big this is, but as a data scientist I know how much effort goes into analyzing that kind of data. Forget analyzing, just how much effort goes in getting that data so that it’s analysis ready. Now, the first step is source data, where exactly is all that information that you give to your physician stored? Does anyone know? EMR data? Okay.

Shruthi S. Bharadwaj:

Now there are several different databases that have EMR data. Has anyone heard of Epic system? Great, of course, you have. Okay, Epic is one, Cerner is another, so there are a lot of these things. Between Epic and other systems there are a lot of differences, so combining data, actually I would be really excited to know how your company does that. Combining these different kinds of data sources together is a humongous job. That’s something that takes data scientists more time than actual analysis of the data. If somebody is giving you the data, great.

Shruthi S. Bharadwaj:

Now once you have the sourced data, like I was saying data extraction, so now these are in very, very various different formats. Now you might know, when you go to a physician most physicians these days have a laptop open and they’re typing something. You say you have a fever, the doctor isn’t looking at you but the physician is actually typing on their laptop. The reason that they’re doing that is they’re capturing every little thing.

Shruthi S. Bharadwaj:

Now in the olden days, let’s say maybe 15 years ago or maybe even 10 years ago that wasn’t the case. 15 years ago the doctor would come in, look at you, you would tell them whatever you were feeling that day and they would actually write it down on a piece of paper. Now how do you get that information from that piece of paper that your physician has written down? How do you pass that information to let’s say your next physician?

Shruthi S. Bharadwaj:

Let’s say you want to go to a cardiologist and your PCP wrote something down on a piece of paper. How does your cardiologist know what the PCP wrote down? Now that is very critical information. How do you get that information all the way to your cardiologist so that they know what medication to prescribe? That’s where you might have heard natural language processing came in. Has anyone heard of NLP? Great, great.

Shruthi S. Bharadwaj:

Natural language processing essentially for people who haven’t heard of it is, extracting this information from that piece of paper. Someone as written something down, you are extracting that text, you are extracting those words, those characters and then you’re making sense of it.

Shruthi S. Bharadwaj:

Now this NLP is its own field. I mean it’s a huge field because obviously it takes a lot of effort to get that information from the piece of paper to a tabular format so that that information can then be analyzed. There are multiple, multiple steps before you get to an analysis and then from that analysis, from those results is when you go to the treatment. From getting the information to treatment there are so many, so many steps and which is what I will be covering today. For EMR data there’s one … Yes, please.

Speaker 3:

Before you’re too far, I just wanted to ask you, with some of the more consumer oriented devices like the watch and et cetera, how do you ensure data integrity? For example, you have so much credibility in capturing heart rate on your watch that you don’t really know if the data is [crosstalk 00:11:51].

Shruthi S. Bharadwaj:

Accurate or not, yup.

Speaker 3:

Then with the EMR and the source data I think it’s probably many people who’ve probably experienced to go to one hospital and collect one set of information. Maybe you’re in another hospital and that information has been transferred even within systems like [inaudible 00:12:06] date of transfer. Then there is a blossom [inaudible 00:12:13] occurrence from one [inaudible 00:12:14] to the other.

Speaker 3:

Then of course the patient is not always mindful of what they’re taking, so they don’t remember the dose they’re on. They don’t remember the frequency, so all this data error that’s going in, how are you correcting for it I guess and ensuring that there isn’t all this garbage in and garbage out?

Shruthi S. Bharadwaj:

Okay, so I’ll answer both of those questions, which basically have very similar answers. To begin with there is no one exact answer to that question, because there really is no one solution to it. How do you correct the error, but there are certain things that you can do to mitigate these errors. They may not completely remove them, but at least minimize them a little bit.

Shruthi S. Bharadwaj:

One way that from my experience that I have seen is the amount of data that you collect, for example you gave a great, heart rate. The amount of data that you collect just from me, let’s say if you collect my heart rate every minute versus let’s say every two hours, difference. If you are collecting my heart rate every minute, you have that amount of data available for you. It’s easier to look at outliers for example, that’s a big one, because if my heart rate all of a sudden drops to 20 when let’s say average is 120, I’m just giving you a number, something’s not right. That is one of the, I think easier ways to deal with that issue.

Shruthi S. Bharadwaj:

With EMR, good thing that you brought it up because I went through the same kind of situation a few years ago now. You’re absolutely right, even within hospitals data doesn’t get transferred. These days I have to say fortunately that has improved quite a bit. Previously like I said piece of paper would be lost or nobody would put it somewhere. The files would be gone so somebody had to go back, hunt for these pieces of paper, try to figure out what it was. These days because the physicians are actually doing a great job, typing everything in, it’s a lot easier to get that information. That doesn’t mean there isn’t any error, like you said, patient themselves.

Shruthi S. Bharadwaj:

For example, me, I’ll take myself as an example. There are all these … I don’t know if you guys know of these apps, healthcare apps that Google has one, I think Microsoft Health also has one. Where you can answer a few questions and based on those questions it gives you a diagnosis. I mean they do say take it with a grain of salt and by the way which you should, but there are ways of how you can ask these questions.

Shruthi S. Bharadwaj:

I’ll give you an example from Alzheimer’s disease. Now a couple of years ago I was working on this Alzheimer’s project. There is something called quality of life, has anyone taken or is aware of quality of life questionnaire? Great. Now, if you look at that quality of life questionnaire and if you try to answer these questions, so I may get 40 questions but someone else might get 140 questions. Does anyone know why that is the case or that could be the case?

Shruthi S. Bharadwaj:

Okay, the clear answer here is, if I am answering something and it looks, the collection, when the data is being collected, it looks somewhat consistent it stops asking you questions. Let’s say, I’ll give you three questions. One of the neural Alzheimer’s patients have tremors, one of the questions could be, “Can you lift this pen?” Let’s say I say, “Yes,” and then next question comes up it says, “Can you lift this paper?” I say, “No, there is an inconsistency right there. Now the data collecting machine and I’ll just call it a machine at this point, the database it knows something’s not right.

Shruthi S. Bharadwaj:

If a patient is able to pick this up but not this up something’s not right. Either the patient isn’t answering right or something’s up. That is when I get 140 questions where Raksha might only get 40 because she’s being honest or she’s answering the questions right.

Shruthi S. Bharadwaj:

Like I said, going back to your questions, these are some of the things that have been employed to minimize error. There’s a very, very long way to go before we can say with certainty that there is no error. I have to say maybe there will never come such a time where we can say there is absolutely no error. That is the state of AI machine learning. It’s not even AI machine learning, it’s just data collection at this point.

Shruthi S. Bharadwaj:

What we have to understand is that, it’s not artificial intelligence or machine learning that we’re trying to judge, it’s the data collection. Your algorithm is only as good as your data, so if we can achieve a situation where we are absolutely collecting good data then your algorithms are great. I mean you can predict whatever you want to predict, you can do anything you want, but it’s all about the data. Does that answer your question?

Speaker 3:

Yeah.

Shruthi S. Bharadwaj:

Perfect. Any other questions? All right, now like I said, there’s data mapping and then you finally export this EMR data. Now when you export this EMR data there are different formats that this can be exported into. Some of these are tabular, like I said, if something’s written on a piece of paper, you are going to employ other artificial intelligent approaches like NLP. Then you extract that information so that you have this entire EMR data for a patient.

Shruthi S. Bharadwaj:

Now going back to this, this is only a small, little blob that’s part of this bigger picture of a patient profile.

Speaker 4:

What are the most common ontologist systems [inaudible 00:18:13]?

Shruthi S. Bharadwaj:

From my thing, LOINC, CIDSC. LOINC for lab codes, I mean essentially you have CDISC codes for all these disease domains. From my experience, these are the most common ones used.

Speaker 4:

Do you have SNOMED [crosstalk 00:18:29]?

Shruthi S. Bharadwaj:

SNOMED also for disease ontologist, I forgot to put that there, but absolutely.

Speaker 4:

RxNorm?

Shruthi S. Bharadwaj:

RxNorm absolutely, I only gave three, but RxNorm for medications, SNOMED for any disease hierarchies absolutely, yeah. Imaging data, so put this as last because I do have a use case that I would like to share with you guys. Imaging data is another big monster. The rest of the data, clinical data can be in tabular forms, but getting this imaging data into tabular form takes a lot of, a lot of, a lot of time.

Shruthi S. Bharadwaj:

Every time we go to a doctor we get CT scans or x-rays or MRIs all of these are just images. There’s very few information that you can just get in tabular form. The other thing is let’s say I get an MRI at Harvard Medical you know hospital and then my friend gets another MRI somewhere else. Those two MRIs are not the same because every MRI machine does things differently. Now combining all this and making sense of all this and normalizing this information takes way longer time than actually analyzing this data.

Shruthi S. Bharadwaj:

Imaging data, normalization, feature extractions, once you normalize this you can extract the features. Now what features I’m I talking about? The example I have is colon cancer example, either colon cancer, you’ll see that in a minute but let’s take another example. Let’s take MRI, so I get a brain scan. What is it that I can see in that picture? Maybe I have a tumor somewhere, so how exactly is the physician seeing the tumor? The pixelization, it’s lighter color or darker color, the area around it is of a different color.

Shruthi S. Bharadwaj:

These are some of the features that you have to extract to make sure how big the tumor is or how dense the tumor is. How do you know how dense the tumor is? How do you gauge the porosity of that tumor? These are all the features that you need to extract before you make any clinical decision from a piece of two dimensional image. That is the challenge that we see these days with imaging data, how do we get that from a two dimensional image?

Shruthi S. Bharadwaj:

Then classification is a machine learning let’s say algorithm where you can say yes cancer or no cancer, that’s a binary classifier. How do you get to that? You need to do these three before you get to the classification step. Now the used case that I have, oh before that data integration, there you go. This I think we have already discussed. Data integration only happens after you have collected all the other information that I just mentioned. EMR data, imaging data, OMICs data, genomic, proteomic, metabolomic data as well as just plain old simple clinical trial data.

Shruthi S. Bharadwaj:

Now how many of you work on clinical trials? Okay, one, two, so when you talk about clinical trial most of these could be tabular information. You have like a patient ID and you have a bunch of row, so it’s basically in Excel format. That what I consider to be an easy peasy way of looking at data, because looking at all the other things, I think clinical trial data is a lot easier.

Shruthi S. Bharadwaj:

Once you have all that information, clinical trial, OMIC, all of those imaging and EMR, that needs to go into some database where a scientist like me can go get that information and then analyze the information. That goes to a database where it can be queried. Before that goes into database you have to process it. How do you know me, patient is the same patient who has imaging data or who has EMR data and who has a clinical trial data? How do you map patients? That’s another big thing that you need to do before you get to the analysis step. Building that database also takes a lot, lot longer.

Shruthi S. Bharadwaj:

The reason why I’m focusing so much on just getting the data, integrating the data is so that we understand that artificial intelligence and machine learning is just a tool. It’s not magic, it’s not something that can give you what you want. It can and you can use it very well, only if you have all this information good to go. Clean information, clean database, clean data and then you can apply your algorithm anyway you want to.

Shruthi S. Bharadwaj:

Challenges I think I’ve already mentioned, syncing all these different data sources. We discussed EMR data where we have to sync from different kinds of EMR like Cerner and Epic those were the examples. Getting data into this big data structure, or we keep hearing this term, big data, big data. What exactly is this big data that we talk about especially in healthcare? Big data essentially means a combination of all of these things that I just mentioned in a big let’s say Excel file, that is what we consider big data.

Shruthi S. Bharadwaj:

The other biggest challenge we have is extracting the relevant information from this big data. How do we know that let’s say you’re collecting some information me as a patient, how do you know that is relevant? How do you know that that information will lead you into diagnosing something? Let’s say I have a heart attack or a heart disease or something, how do you know? Can you just get that information from the heart beat that you have here? Do you need other information? Do you need blood work? What exactly is needed to get that and how are you making sure you are getting that relevant information? These are the biggest challenges that we see today with healthcare data.

Shruthi S. Bharadwaj:

All right, now comes the fun part. Once you’re done with all of this data collection, making sure the integrity of data is good, that’s when you can apply these three things. I just put knowledge graph there because that’s also one of these buzz words that go on these days. Has anyone heard of knowledge graph? Two, three people, great. We’ll just start with artificial intelligence.

Shruthi S. Bharadwaj:

Here in this room, what do people think of when I say artificial intelligence, what is it? Anyone? Anyone wants to just take a guess? No? Good, because even I don’t know. The thing is, with artificial intelligence people have various, various definitions. Some people like let’s say Google is using artificial intelligence to do very different kind of things and so is Microsoft, so is Novartis now.

Shruthi S. Bharadwaj:

What a pharma company would consider artificial intelligence may not be what Microsoft or Google may consider. Now Facebook is collecting all the information that we, all the pictures that we post on Facebook, Facebook is collecting all that information. I don’t know if you guys have noticed, let’s say on Amazon you were looking for something, a purse. The next day or maybe even in a couple of hours you’ll see that ad showing up on your Facebook. Has anyone wondered why that happens? Exactly.

Shruthi S. Bharadwaj:

Data is being collected no matter what you do, whatever you do, a simple thing like Facebook, although that is not simple it’s collecting a lot of information so as Google. Let’s say you googled something, you googled a specific gene or you googled a specific heart condition. Or let’s say colon cancer, you will see an ad for a colorectal examination or something at some point, it’s going to happen.

Shruthi S. Bharadwaj:

Artificial intelligence has a lot of meaning, going back to that point. For us, for a pharma company, artificial intelligence could just mean natural language processing. That is where pharma companies are at the moment, so what is relevant to us is what that physician writes on that piece of paper. That is what we need.

Shruthi S. Bharadwaj:

Right now at this given moment, we don’t really care so much about what the patient is typing on their Facebook. Although I have to say at some point that is going to be very, very useful because you might be surprised at how much information we give out on Facebook or other mediums like that or Instagram or something. If I have a stomach pain maybe I just say, “Hey Chelsea, I have a stomach pain.” Guess what, that information is being collected. Now Facebook knows that you have a stomach ache and you will get an ad for that. That’s artificial intelligence.

Shruthi S. Bharadwaj:

Machine learning, any guesses on what machine learning is? Okay, I will give you the answer. Machine learning like the name is basically how you teach a machine, that is all it is. How are you teaching a computer? That is essentially what machine learning means. In machine learning, let’s say we have this laptop, I have a laptop. How do I teach that laptop that does not have a brain like us, how do we teach that thing to distinguish between let’s say a cat and a dog?

Shruthi S. Bharadwaj:

There is something called labeled data. Does anyone know what that means? No? Okay, labeled data essentially means you have a certain condition, I’ll give you an example of oncology. I am a patient, I have a heart rate of 20, let’s say I have a heart rate of 20 and I have some blood report which says something’s not right, blah, blah, blah. Then I have been diagnosed with colon cancer, now that is my report. What I tell the machine is, “Hey, this patient, Shruthi has this heart rate, this blood work and she has been diagnosed with colon cancer.” Now I have labeled patient, me, with colon cancer. Now that is labeled so that is supervised.

Shruthi S. Bharadwaj:

Now what is unsupervised learning? Exactly the opposite. Now I give my laptop a set of information. It is an Excel sheet let’s say a tabular format, but I don’t know what I have. I have all the other information but just that diagnosis column is missing. There’s blood work, there’s genomic data, there’s imaging data, there’s all of that data but that diagnosis does not exist. Now I’m asking my laptop or my machine to tell me, “Hey, do I have cancer or not?” That is what we call as unsupervised learning, machine learning.

Shruthi S. Bharadwaj:

Knowledge graph, anyone has heard at all about this word? No? Okay, so knowledge graph is a very funny, I have to say it’s quite funny. Knowledge graph essentially is a database. You guys know what a database is right? It’s basically a big database. It’s a warehouse where you can input all this information in a tabular format. Knowledge graph is also a database but it has a little kick to it.

Shruthi S. Bharadwaj:

The reason why it’s a little different is because it can incorporate several different kinds of information. I was saying there is imaging, there is EMR, there is some other thing. Let’s say all these have not been mapped to a single patient, so it’s not in a tabular format. I’ll just use that for a change because I’m bored of this.

Shruthi S. Bharadwaj:

Let’s say there’s EMR here and then there’s imaging here. Now, in a traditional database this will be like let’s say patient ID, EMR information let’s say age. From imaging. you’re getting some diagnosis. This is what a traditional database would look like.

Shruthi S. Bharadwaj:

Now it’s unlikely that you’ll get all this information that cleanly so that you can use this. This is dream come true for data scientists. What usually happens is, you have EMR, you have imaging, you have some other data that you don’t know. There’s NLP data that’s coming from some other thing so there’s a lot of different things that are just sort of bombarded at you. As a data scientist what do you do when you see something like that? You say, “Okay, now I don’t know what to do,” but that’s when knowledge graph comes in.

Shruthi S. Bharadwaj:

Knowledge graph essentially what is does is, it draws connection like that to each and every blob. Knowledge graph creates these two way associations where you can say, “Okay, this patient EMR sort of looks like this patient who also has imaging that corresponds to this patient who has this doctor’s note.” It’s drawing this sort of not conclusion but just associations between these blobs.

Shruthi S. Bharadwaj:

This is not conclusive, it’s not that you can say, “Oh knowledge graph it just did dah, dah, dah, dah, dah and there you go that’s the patient,” no. What it does is it gives you an inclination, it gives you a direction in seeing, okay, maybe this patient, this patient, this patient, this patient are one and the same. Maybe there is some connection there. Maybe patients who have this specific kind of blood work also seem to have this specific kind of imaging data, also seem to have this specific kind of doctor’s note. It’s basically just drawing those associations.

Shruthi S. Bharadwaj:

Now the knowledge graph is an excellent tool, but like I said, it’s always better to crosscheck. We’re not in a position where we can leave things to machines, we still need humans, which is where we come in. That is what it is. Questions so far?

Speaker 5:

It’s a question, [inaudible 00:32:58].

Shruthi S. Bharadwaj:

Could happen.

Speaker 5:

Yeah, because I’m wondering because in controls something like trial you’ll have [inaudible 00:33:04]. I’m not sure if this will be beneficial to a trial [crosstalk 00:33:04] diagnosed patient that you’re looking for.

Shruthi S. Bharadwaj:

Correct, so unsupervised doesn’t necessarily have to be just for diagnosis. Let’s say you have diagnosis information, but you want to find something else. The example I just gave was, if I want to predict if someone is diagnosed with colon cancer, that’s easy because you have that information in the clinical trial dataset. Let’s say you need to predict something else.

Shruthi S. Bharadwaj:

Let’s say you have diagnosis of colon cancer, but you want to see or let’s say IBD because IBD usually progresses to colon cancer. Let’s say we have an IBD clinical trial dataset where we don’t know if a patient is diagnosed with colon cancer. You wanted to predict if that’s possible using rest of the data, now that would be an unsupervised learning because you are trying to predict something that does not exist in that dataset. Does that answer your question?

Speaker 5:

[inaudible 00:34:16].

Shruthi S. Bharadwaj:

This knowledge graph is not for diagnosis, it’s not for predicting. Knowledge graph is just a database. What it’s doing is, with those blobs it’s budging you into creating something like this. It’s doing this, this, this, this, this so that you can create this data file. From this data file is where you get to do the actual analysis and predict and do all those funny things. Does that answer your question? Any other questions? Yes?

Speaker 6:

It’s a two part question, [inaudible 00:34:55]?

Shruthi S. Bharadwaj:

Sorry, I didn’t hear the last part.

Speaker 6:

The data organization part, [inaudible 00:35:07].

Shruthi S. Bharadwaj:

The first is semantic, so does it include?

Speaker 6:

Semantics of what you [inaudible 00:35:15], does it have a language associated with that [inaudible 00:35:21]?

Shruthi S. Bharadwaj:

Yes, the answer is yes. It does, so I won’t call it normalization, but there’s absolutely a translation. Everything that you’re collecting into the knowledge graph database needs to go through this. It’s very similar to a query language if you have done any database query. That language, the syntax looks very similar to that, but there’s a terminology, there’s ontologies that basically help the user to incorporate all the data. Now second part of the question was?

Speaker 6:

Just to follow up to that, is that semantic networks 2.0 again or is it something completely different?

Shruthi S. Bharadwaj:

You have multiple ways of doing it, but that could also be one of them. What I view from my experience with knowledge graph I have not used that. That theory exists so that absolutely you need something to incorporate all this information otherwise it doesn’t work.

Speaker 6:

The second part question is about data organization where when you have multiple databases most of the recent technology such as data [inaudible 00:36:30] will connect that [inaudible 00:36:32].

Shruthi S. Bharadwaj:

Mapping yeah, correct.

Speaker 6:

Is that how it’s done on knowledge graph or it’s different?

Shruthi S. Bharadwaj:

No, it’s different. What you’re saying you can do with just say simple R script. Let’s say you have patient ID in EMR, patient ID in the imaging and all these are tabular things. This is an Excel file, let’s say this is in Excel file, so is this, so is this, so is this, so is this. Each one of those Excel files has a patient ID column if that’s the example. In that case it’s easy to connect everything, you can map.

Speaker 6:

It can be from two [inaudible 00:37:12] who cannot share the patient ID number.

Shruthi S. Bharadwaj:

Just anonymous ID, okay in that case you cannot use patient ID as the mapping. Absolutely, in that case knowledge graph is useful because what essentially knowledge graph will do is it will look at other variables. There’s some artificial intelligence in knowledge graph, so it looks at this patterns. That again is only good.

Shruthi S. Bharadwaj:

The amount of data really has to be humongous for this to work. You cannot do this with let’s say with just one clinical trial data. It’ll be too little data to use something like this. This is truly big data problem, but let’s say you do have that much data, then this is something that you can use to create these patterns. It will actually tell you, okay this patient looks a lot similar to that patient in that database,” so maybe draw a connection there or draw an association there. That’s the whole purpose of knowledge graphs essentially.

Speaker 7:

This question, so from my understanding, knowledge graph is a higher level of clustering kind of thing. It’s a multi-parable clustering?

Shruthi S. Bharadwaj:

I wouldn’t call it clustering per se as in machine learning clustering. It uses similar theory but all it’s doing is it’s sort of associating the patients or groups of patients like clustering. When you use the word clustering it essentially means it’s putting these blobs into yeah. No, it’s not even close to that, which is why it’s helping you figure out this big Excel file but it is not doing that PCA, no.

Speaker 7:

Another question, does that solve the problem of putting together multiple data from multiple centers? Where you’ll have multiple databases normalized different lead in different centers [crosstalk 00:39:14] would that be solved by using this knowledge graph?

Shruthi S. Bharadwaj:

In theory yes, so it has to do with how much data you’re collecting. Let’s say you’re collecting 10 patient data from institute one to institution two, 10, 10, 10, no it’s not going to work. You need to have enough data so that this can see that pattern. It won’t see that pattern let’s say even with 100 patients, it’s not going to happen. It has to be a lot more and that pattern has to be between features and not just patients. That is how it’s associating through features and not through patients. Make sense, right? All right any other questions?

Speaker 8:

Not so much a question just like a discussion point. Like in chemistry so in research, looking at those three and then looking at knowledge graph, it probably tying together let’s say previous chemo types for certain drugs. Then looking for patents that are all these compounds that say which toxic group and you know.

Shruthi S. Bharadwaj:

You can.

Speaker 8:

I’m thinking of …

Shruthi S. Bharadwaj:

You can also just use NLP for that, so you don’t even have to go as far as knowledge graph for that. That’s a great example, so oncology again let’s say, colon cancer. Colon cancer you have some clinical trial data or you have these molecules or the synthetic molecules that you’re creating. You have some patents that where things have already been created and you want to look at these patterns.

Shruthi S. Bharadwaj:

Natural language processing essentially can read through these patents so these should be online. Or even if they are hard copy, PDF files, natural language processing can basically go in. Let’s say you are looking for molecule X, Y, Z, I mean let’s say that’s what you’re looking for. It can go through, read through each of these patents, each of these publications, each of whatever you want it to read and pick out that X, Y, Z from everywhere.

Shruthi S. Bharadwaj:

That tells you, “Okay, this is X, Y, Z molecule is what I’m working on. There’s a patent that also has X, Y, Z in it.” You can make it as complicated as you would like, but the more complicated you make the less results you will get. That’s essentially how it is. Does that answer your question?

Speaker 8:

Yes.

Shruthi S. Bharadwaj:

All right, use case. This is a pretty picture that I copied from my previous publication. What you see here is colon cancer tissue, the top two are H&E stained. The bottom two are exactly the same tissues but they have been remeasured using scanning electron microscopy.

Shruthi S. Bharadwaj:

The difference you see here is, these are decellularized, so it’s just the extra cellular matrix that you see here. Does anyone understand? Without the cells decellularized extra cellular matrix. Let me give you a little background on what I was trying to do when I was trying to get these images.

Shruthi S. Bharadwaj:

Essentially this is part of the pattern that I have, but what it is, is let’s say I’m a patient. I have IBD, inflammatory bowel disease. Now historically and in literature there is a lot of connection between IBD with colon cancer. Almost 20% this was 10 years ago there was a 20% chance that a patient with IBD will in 10 years develop colon cancer.

Shruthi S. Bharadwaj:

Now knowing that statistic back then I was thinking, “What if there is a way that you can look at tissue samples of IBD patients and then somehow try to predict how likely is this patient to develop colon cancer in 10 years? Is it there? Is it not there? If it is, what grade of colon cancer?” That’s essentially what my thesis was, that’s essentially what the patent is.

Shruthi S. Bharadwaj:

Going back to this, one of the first things that I had to do was, if you have seen images, just tissue sample images, these are two dimensional. What I was trying to get here was, within a tissue sample, does tissue porosity, does tissue features like the protrusions within the tissue sample, do they provide as information about the onset of colon cancer in patients with IBD? Now that was the broader question that I was trying to answer.

Shruthi S. Bharadwaj:

What you see here is a normal tissue, what I consider normal is noncarcinogenic or non-tumor tissue, which means these are all IBD patients. The top picture that you see is of a normal non-colon cancer patient. The bottom is of a cancer patient.

Shruthi S. Bharadwaj:

Now these images are two dimensional. How do we get from the two dimensional images to three dimensional because that is essentially what I was looking for. Now if a tissue is rough, like a lot of protrusions, how do I measure those protrusions? You need to get that three dimensional image. That was my first experience with image processing as well as these days we call it artificial intelligence.

Shruthi S. Bharadwaj:

Now the way we did that was, using the two dimensional images, so we had about 500 patients. 500 normal, 500 colon cancer patients. For each of these patients we created this tilt. We programmed it in such a way that for every image it would create a 10 degree tilt this way, 10 degree tilt that way. That would basically get us to something that looks like this, three dimensional.

Shruthi S. Bharadwaj:

Now from the three dimensional data, we could extract the little protrusions. I mean it’s important to know that this protrusions were in nanoscale, so tiny, tiny, tiny protrusions, but nonetheless very extremely important information.

Shruthi S. Bharadwaj:

Now porosity also, so one of the big giveaways when I was looking at this was, look at the porosity. Porosity essentially is how quickly can cells penetrate through the extra cellular matrix, so that’s essentially what metastasis is. If a cell, if the porosity is almost, look at this almost three times. In a cancer patient the extra cellular matrix was about 60% porous, which means it was that much easy for the cells to just go through. In a normal, the cells just wouldn’t go through because that’s just the ideal normal way of being.

Shruthi S. Bharadwaj:

Now based on that I started digging deeper. This is just method of how we do it, how we get just the extra cellular matrix. How decellularize it, we remove the epithelial layer and then that is what you see. This is the same image, so this basically becomes that. That happens because you’re introducing this 10 degree tilt and you’re seeing what the protrusions are.

Shruthi S. Bharadwaj:

Now protrusions are, I don’t know if you can see it clearly, but these are the tiny protrusions. This is like a dark circle, right there is how deep it is. How deep something is or how high something is basically gave us a lot of information about the prediction of colon cancer.

Shruthi S. Bharadwaj:

Now from that image, we created something called a roughness profile. Now in roughness profile it’s basically telling us how high, how low in nanometer scale. Using that, for my PhD and just for the thesis I was looking at these for features. Now there were a lot more features that we could look at, but since it was a PhD thesis, there was limited time so this is what we focused on. Because we for based on literature, we thought these were the four most important features that would tell us more about this onset of colon cancer.

Shruthi S. Bharadwaj:

Now, if you just plot this and compare normal versus carcinoma, this is roughness profile, average roughness profile over 500 patients, you see a drastic difference between what normal average looks like versus carcinoma. This was like huh, this is fabulous because if there is such a distinction between normal and carcinoma, there definitely is something that you can extract from this and use that information, relevant information to diagnose the onset. That was the idea that we were going for.

Shruthi S. Bharadwaj:

This is just the same, just giving you the numbers in nanoscale. All right, now this is where, I’m sorry if the images aren’t that clear but essentially what I’ve used here is a support vector machine, it’s classifier. I know some people already know what classifiers are, support vector machine essentially is a supervised learning … You teach the machine, it’s a machine learning technic. You teach the machine, so it’s labeled data. You have 500 patients that have these four parameters I just said, the roughness profile parameters and you also have the diagnosis attached.

Shruthi S. Bharadwaj:

You’ll say patient one has this roughness profile and has colon cancer. Patient two has this roughness profile but does not have cancer so like that we have 1,000 patients, 500 each. We teach the machine and machine learns from this dataset. Once it learns you introduce another set. You introduce something that you don’t know. There is the diagnosis column is missing.

Shruthi S. Bharadwaj:

Now when you introduce it can predict, it can tell you, it can classify just using the roughness profile if the patient is normal or has a tumor. Everything in red, I don’t know if you can distinguish the colors but everything in red, it’s clustering again. Everything in red basically falls to one side everything in green, normal is on the other side. The point of this is that, it’s using the information, extracted information from images we were able to classify patients, diagnose patients as either they have tumor or they don’t have a tumor. For today’s presentation I basically I will end here, but this continues so this isn’t the end. Just classification is just one machine learning algorithm. Artificial intelligence, machine learning the whole purpose of that is prediction. You should be able to predict something based on the humongous amount of data that you have. We did do that but in this presentation I think we are almost running out of time, but yeah. Basically that is what we did, so I will stop here and just see if anyone has any questions. Yes?

Speaker 9:

Can you just [inaudible 00:50:26]?

Shruthi S. Bharadwaj:

Okay, so there is an entire sort of image, but the way we did it was, that after the classification that you see here, once we have all that information, we also looked at different grades. For each grade of colon cancer, there were very specific roughness profile that we could generate. Even between let’s say grade one, stage one, all these different levels we could see in nanometer scale, we could see a lot of difference. For each of those similar concept was used, similar roughness profiles were generated.

Shruthi S. Bharadwaj:

After that what we did was, we used not classifiers but we used prediction … Sorry?

Speaker 9:

Regress.

Shruthi S. Bharadwaj:

Exactly, we used regression algorithms just to see how well can this predict. Based on the classification since the data was so easily separable, we were able to predict with 95% accuracy. Now for something like that I have to say, we only used four variables. Let’s say if we had used 20 variables maybe the outcome would be a little bit different, but with those four variables the prediction rate was about 95%. That’s basically what we tried to do. All righty.

Speaker 10:

[inaudible 00:51:46]?

Shruthi S. Bharadwaj:

Yes, please.

Speaker 10:

[inaudible 00:51:53], were they communicated to patients in positions and they did some kind of I don’t know [inaudible 00:51:58]?

Shruthi S. Bharadwaj:

This was just a patent, so we are still trying to commercialize this. This sided as a PhD student so I graduated but someone else tried to commercialize this. I graduated from University of Florida, so this still belongs to University of Florida. We could not directly impact patient lives using this because this was more of an experiment, this was more of a thesis that I was working on.

Shruthi S. Bharadwaj:

The idea of this was, yes it can be used, there’s nothing stopping us from using that, but again it has to jump a lot of hoops before we get there.

Speaker 10:

[inaudible 00:52:41]?

Shruthi S. Bharadwaj:

Yes, so there was a followup study that was published, so it was a clinical paper that was published. What they said was, with 95% accuracy, a lot of these people, not it wasn’t 95% accurate, but they were. What we diagnosed or predicted did come true, but it’s a different publication. I’m not the author on that one, so I did not put it there, but it absolutely exists.

Speaker 11:

Were you able to correlate that with [inaudible 00:53:16]?

Shruthi S. Bharadwaj:

No, so we did a 10-year sort of study. Even now so that 10-year isn’t over, so it’s not that the study is complete, but this algorithm basically said that with 95% accuracy we can predict over a 10-year time frame that these patients will develop colon cancer with 95% accuracy.

Shruthi S. Bharadwaj:

We cannot still answer that question with certainty because 10 years hasn’t happened, but even before that, four years after this was published there was a followup. In that four years we already saw certain patients who developed colon cancer. All right, any other questions?

Shruthi S. Bharadwaj:

Okay, all righty, so I think that is basically the conclusion that I have for you for this presentation. One thing I do want to say is, artificial intelligence, machine learning is extremely cool. It’s fascinating, it’s unbelievably interesting to see how you can use this, use these different approaches. One thing we do have to keep very much in mind that it does not imply that it can cure everything. We have to understand that yes it’s usable, yes it’s great, but there are a lot of caveats that go with using AI and machine learning tools and all of these things.

Shruthi S. Bharadwaj:

All right, hopefully I haven’t discouraged people too much from getting into the AI, machine learning side of things. Honestly, I really do enjoy working on this, but yeah. Look more into it, learn more, read more. There’s a lot of things coming up. Every single day you see something new, knowledge graph especially is something new. You probably hear block chain and this and that.

Shruthi S. Bharadwaj:

What we also have to understand is none of this is new. None of this is that ground breaking. If you think about it, it’s all common sense, but common sense when applied to the relevant thing that you do becomes magic. I think that’s what the pharmaceutical industry right now is trying to do. Use what’s available and get something out of it, create impact. All right, that’s pretty much it. Thank you.