From Astrophysicist to Deep Sentinel Sr. Data Scientist
You have a PhD in Astrophysics and did research at Johns Hopkins University. What led you on this path?
I’ve liked astronomy since I was 13. I remember going to a local space museum for a primary school trip, and feeling an intense connection to what I saw. I was fascinated by the mysteries of the Universe, and since I always loved math, I decided to do my undergraduate and master’s degrees in physics, and eventually go for a PhD in astrophysics. Going to Johns Hopkins University to continue my research, and working under the father of astronomy databases— Professor Alex Szalay—was my dream come true.
How did you apply your PhD in real life?
Astronomy has what you would call a Big Data challenge. The Sloan Digital Sky Survey (SDSS) helps to make sense of it all: It has imaged many galaxies and provides the most detailed three-dimensional maps of the Universe ever made. I joined the SDSS after I completed my PhD, and had the chance to collaborate with mathematicians, statisticians and computer scientists on the galaxy classification project. Our idea was, if we could tell exactly how similar one image is to another image, we can better classify galaxies. Because a galaxy image is multi-dimensional (many pixels and colors), the problem turns out to be very challenging, and is one that requires machine learning algorithms and computational power.
How does this tie in to the work you do at Deep Sentinel?
At Deep Sentinel, we use AI-powered computer vision and deep neural network technology to deliver intelligent home security. Computer vision helps our cameras to identify what they see. The way we structure and solve the problem relies a lot on classification. My experience in analyzing galaxy images comes in handy.
Previously, I built a system to classify images, making use of a computer. But in my head, I built the mathematical intuition for the classification problem. I thought about point distributions and how to cluster them in high-dimensional pixel space. This kind of thinking helps me to delve naturally into deep learning, a technology that allows us to cluster data by their classes, or in the case of Deep Sentinel’s technology, to answer the question, is this an image of a “Person” or a “Car”?
How are you training the machine learning system at Deep Sentinel?
We started by picking a deep learning model architecture. At Deep Sentinel, we use state-of-the-art models that provide the highest classification accuracies from recent ImageNet competitions. The second stage of training is to provide data to train our algorithms. In our case, that’s a set of images and their labels (for example, “Person” or “Car”), collected from the initial users of our cameras. Finally, we evaluate a model by testing it on images that have not been seen by the model before. Our goal is to drive down the false negative rate (the inability to identify an intruder when there is one) to as close to zero as possible, and to identify exactly what’s in the image. Then, once the Deep Sentinel system is used in the owner’s property, we’ll further improve the model by training it with additional images which are relevant only to that particular user.
What does it take to be a good data scientist? Does your background in Astrophysics aid your work in data science?
A good data scientist is a bit like a Renaissance man or woman who possesses a mix of backgrounds. S/he needs to be a competent programmer, know about data structures and algorithms, and have a solid understanding of math, statistics and probabilities. But being an expert in all of these does not guarantee success. The ability to think outside the box, and incorporate these subject matters in an exploratory manner, are also important. Especially at a startup (compared to a big company or academia, where you might work on a smaller part of a problem), you have to be a self-starter. When you can’t Google it, you need to step into the shoes of any number of experts in order to solve the problem.
To understand the universe was no different. Astronomy is transitioning from a data thirst to a data deluge discipline. Domain astronomers have to come up with new ways to extract information from the data. They have to learn new technologies that were simply unnecessary in the old curriculum. Back then, I trained machines to classify complex galaxies. Today, I’ve brought that experience to teaching machines to recognize images of people, cars, and animals, in order to make home security more effective. My background as a modern astrophysicist has shaped me into becoming a better data scientist, and I’m excited about the work ahead!