Join us every Wednesday for a weekly live demo of Heap

The Wide World of Data Science

The Analyst is a podcast about how to use data to solve problems, in business and beyond.

In our inaugural episode, we chat with Jessica Kirkpatrick, an astrophysicist-turned-data scientist currently working at Hired. Hired’s marketplace brings together highly qualified individuals with the companies who want to hire them.

Given her uniquely interdisciplinary background applying data to different problems in different industries—including academic, government, and business applications—we discuss what “data science” really means at companies in 2016 and how data can be used directly to further a company’s mission internally and in the public sphere.

Receive new episodes instantly by subscribing to The Analyst:

iTunes | Google Play Music | Stitcher

Episode Transcript

Sarah: Thank you so much for joining us, Jessica. Before you worked in data science, you received a PhD in astrophysics. Could you share your story about going into that field and then transitioning out of it?

Jessica: I took physics for the first time as a senior in high school, and I was not particularly excited about physics, but it was something that my guidance counselor told me I should do to get into college. I fell in love. I was fascinated with the idea of being able to translate what you see in the world into math and equations. I kept studying it for as long as they would let me, which meant that I ended up getting a PhD in astrophysics, and it involved using data from a telescope called the Sloan Digital Sky Survey, which has been scanning the night sky for 15 years now, taking pictures of different astronomical objects.

For my PhD, I used this huge data set of millions and millions of stars and galaxies and other types of objects to try to understand more about the way the universe works. I particularly was interested in objects called quasars, which are really, really bright, distant galaxies, and trying to understand a little bit more about them.

I was fascinated with the idea of being able to translate what you see in the world into math and equations.

I had, at that point, been doing graduate level work in astrophysics for ten years. I did two masters and a PhD and felt like I was ready to learn something new. I was getting to the point where the types of things I was learning had slowed down in the last few years, and so I was trying to think about what I could do with my skillset that would be slightly different and expand my domain.

I looked into jobs in finance, consulting, and data, and ultimately decided that data science was a really good way to utilize my skills. It’s got a really similar set of day-to-day tasks, but in a different domain. Still doing a lot of coding, a lot of math, a lot of statistics, a lot of modeling, but instead of the datasets being stars and galaxies in this huge astronomical dataset, it’s human behavior.

You’ve worked in academia, as you mentioned, and you’ve created workshops for academics about data science. You helped the team internally at Hired use data, written for publications like Quartz, and worked with the educational nonprofit Data for America. For our audience, which includes people performing numerous functions in different types of organizations, how would you define data science?

Data science is, at this point, a catch-all term, and it means a lot of different things to different companies. For some companies, data science is what used to be called analytics, trying to use numbers and insights to influence business decisions. For some companies, data science means machine learning and algorithms, trying to optimize the decisions that a website is making, using unsupervised machines. For some people, data science is about infrastructure and having a pipeline that takes the raw events and clicks that happen on a website or an app, and turning them into metrics that can be used for decision-making.

Data science is anything that incorporates data and measurements and uses them to make a company perform better. I think of it as applying the scientific method to datasets that are company-specific, versus data that you would collect in some other experiments.

I think of data science as applying the scientific method to datasets that are company-specific.

I’ve found that smaller companies will tend to have one or two data people that do a wide variety of things because they can’t afford to have people who specialize in machine learning, and data architecture, and analytics.

As companies scale, you tend to get data teams become more specialized or even embedded within certain product teams. For instance, Facebook has data scientists that are working specifically on the algorithm that decides what order your news feed looks like. That’s a whole team of people that focuses on that very specific part of the product. A smaller company might have a data scientist that works on a wide variety of things, all over the product, as well as also building infrastructure, doing data modeling, and helping with analytics and decision-making at the strategic and business level.

How would you say data science is different from or inclusive of practices like analytics, architecture, and data engineering?

There are certain jobs that would not be encompassed by what was historically considered analytics or data engineering. Specifically, jobs involving machine learning, natural language processing algorithms, or A/B testing. Those were not traditionally done in analytics-type roles. They’re newer methods for applying the scientific method to company data.

As we become more and more digital, there’s just more information that’s being collected about every single pathway that someone navigates through their purchasing decision. If they’re buying something on a website, or browsing through a website, you have a lot more information about the steps that people are taking, whereas before, when you were going into a store and buying something off the shelf, we would know what unit you bought and where you bought it, but wouldn’t necessarily know how long you took to make that decision, or which other objects you picked up and read before you made that decision.

As we become more and more digital, there’s more information that’s being collected about every single pathway that someone navigates through their purchasing decision.

Now we know that, because we know which pages you look at, how long you’re on the page, and whether you put it in your cart and buy it instantly or save it for later. There’s just a lot more context, even with brick and mortar companies.

That requires people who are specialized in handling data, doing statistics, and applying the scientific method to datasets, which is why a lot more PhDs are coming in and doing the role of data scientist. Before, you might have an analyst who maybe has a background in finance, or a business degree, but not necessarily the research and science background that a lot of data scientists have.

What do data science and analytics practices look like at Hired, both when you began and since?

We have two main data teams at Hired. One is our machine learning and algorithms team, which works on powering the matching engine that decides which job-seeking candidates are best for which roles that companies are hiring for, and helps make those connections in an automated way.

The other piece is deciding which companies and which candidates are of the caliber that are right for our platform, so that there’s both the demand and the supply for the roles, but also that they actually have the right skill sets that our companies are looking for, or the company is the type of company that our candidates want to work for. That’s the machine learning team, and that’s embedded in the engineering team.

Then we have a team that I’m on called insights, that does all of the other things that involve data. We’re helping executives with strategy, we do modeling and projections to help set goals for the company, and we do reporting so that every team understands how they’re performing, where they’re underperforming, and where they can focus to improve. We also run all product tests.

What does a thriving data culture looks like?

The last company that I joined was a startup named InstaEDU, which has since been bought by Chegg. I was the the 11th employee. It was a very early decision to hire someone purely focused on data; usually, companies tend to wait until they are ~50 people to get a dedicated data person because you don’t have tons of data early on.

One thing that I thought was really smart about that decision was that I could help us be thoughtful about how we were going to scale the website and product and what information we should collect, such that we have the right things in place to do these analyses later on.

That involved a lot of partnering with our head of engineering, who had already done a really good job of keeping things quite organized. But just having my perspective of knowing what we might need to do when we’re two years out or five years out and being able to prepare for that and set up for that scale was really helpful.

Early on, my advice would be to try to get people in, whether it be employing full-time data people or advisors to think about how you can start laying the proper foundation to be data-driven.

Given your breadth of experience, how do you assess what makes a great data scientist?

I tend to interview and assess candidates, not based on a particular technical skill, or a coding test. I think it’s quite easy to teach someone a particular coding language or teach them a particular technical skill. What I think is harder to teach is problem-solving ability, being able to communicate well, being able to make tough prioritizations, when it’s not exactly clear which direction to go.

What I think is harder to teach is problem-solving ability, being able to communicate well, being able to make tough prioritizations, when it’s not exactly clear which direction to go.

I tend to assess candidates by presenting problems that are similar to what I’m working on, and ask them how they would approach them, try to dig into deeply, like what they would do if they came at a decision fork, how they would make that choice, how would they assess if it was the right or wrong choice afterwards, and then trying to understand how they organize their thoughts.

Data access and transparency has the opportunity to further an organization’s mission in a number of ways, like improving the product experience, understanding behavior to boost conversion rates, and depending on the nature of your product, sharing original data and research with the public.

I recently read your study on the gender wage gap where you applied Hired data in the service of both the organization’s mission and the broader conversation around wage equality. Can you explain this study and the impact it’s had both at Hired and beyond?

One thing that’s really fun about being a data scientist is when you have the opportunity to take the proprietary data that your company has, and use it to inform the world, or the wider world, or the wider industry about how things work in a way that they might not have previously known.

One of the things that got me really interested in data science was OkCupid’s blog, OkTrends. They have a lot of really fun analyses that they’ve done based on their dating website data. When I started at Hired, I said that I would really love for us to have a similar philosophy to use our data to understand how hiring works, to try to understand what’s working, what’s not working, and then use that to help us educate the general industry.

gender wage gap chart

The wage gap study started off as a hack project that I did for a hack day. I looked at the candidates on our platform and at how women and men were performing differently in terms of how many interviews they were getting, the salaries that they were getting, how many offers they were getting, and how likely they were to get hired. That eventually turned into a report that we put out for Equal Pay Day in April 2016, and you can find it on our website and our blog.

What has been found many other times, is that women are getting paid less than their male counterparts, even when you control for things like years experience or job title. Because of the way that Hired is set up, we allow our candidates to set their preferred salary, so that employers know from the get-go what they’re expecting to make. We did find that if we controlled for candidates asking for the same amount of money, that men and women were getting essentially what they were asking for, and therefore the wage gap was going away.

That’s led us to start working on product initiatives where we help candidates set their salaries not based on the salary that they made at their last job, which might have some biases or discrimination built into that from previous experience, but based on what people with their skillset and their experience are currently getting offered on our platform. Using up-to-date market data, the hope is that then both men and women will understand their worth and ask for what they are worth, instead of what they are currently getting, which, statistically for women isn’t as much as men.

We repeated the study on other demographics, and we’ll be releasing that shortly. The hope is to just continue to educate people about various ways that discrimination is happening, so that they can combat biases and get what they deserve in terms of salaries.

salary calculator - salary benchmarks based on real interview requests

View the Hired salary calculator

You mentioned in a related article that HR should use data, which is not exactly a team or department that people immediately think of as being data informed or data driven. Could you expand on that, perhaps in the context of the gender wage gap study?

I’ve recently partnered with our HR team and our head of people to release our internal demographics, and internal wage data for Hired itself. We released that last week on our diversity page. I believe that you can’t make changes to a company if you don’t have baselines. At Hired, we’re trying to understand where we’re losing candidates in the pipeline. Are we not getting enough underrepresented candidates in the door? Are they not making it to final interviews? Are we hiring them, but not retaining them? How can we improve these things?

Similarly with wages, we’re trying to be very data-driven about how we are compensating people, continuing to assess if the compensation is fair given the market value for that role, and making sure that any promotions or raises are backed by evidence that is deserved and aren’t coming from a biased perspective.

If you’re not measuring these things and having some objective lens to make sure that there aren’t big problems going on in your organization, then you could end up having a propagation of discrimination and other types of issues from a people perspective. People are now going into people analytics, so there are data teams that are just studying internal hiring data, usually at very big companies like Google or Facebook.

What do you love about data and data science personally?

To me, it feels like conducting magic. A person will come to me and they’ll have a very abstract question like, “Which geographical location should Hired go into next?” It’s very open-ended, and there’s not necessarily a clear way to go about measuring that. It’s fun to look at the various information we have and try to find a way to make a measurement that is both efficient but will get an accurate answer, then support people in their decisions so that they can feel more confident.