The data science world can seem abstract from the outside, but on the inside it’s full of fascinating work. Today I sat down with Joshua Weaver, a data scientist, thought leader, and Analytics Manager at Amazon in the Human Resources department. In his work, Joshua leads a team of HR analytics professionals tasked with using data to support 18,000+ employees spanning 6 organizations. He graduated from Seattle Pacific University with a PhD in Industrial/Organizational Psychology.
Let’s start with the basics: why did you originally choose a career in data science?
I fell into data science and analytics when I was first discovering statistics in grad school. I was fascinated by the ability to measure squishy concepts like job satisfaction or job performance. It seemed data and analytics provided clearer paths for interventions and had a way to scale impact. I’ve always wanted to have a positive impact and have looked for ways to magnify that impact—and data science is a natural fit for putting those ideas into action.
Your work as a data scientist at Amazon is specifically in the realm of Human Resources. Can you tell me a bit about the challenges you’re addressing and dealing with day to day?
The things we’re working on right now are representative of any company climbing the data analytics maturity curve: We are focused on getting the fundamentals right: data quality, data integrity, taxonomy, and data processing. All of this with an eye on enabling on-demand, self-service reporting. Establishing operational reporting provides the critical foundation for more advanced analytics like predicting turnover, career success, and so on. The other challenge we are addressing that most BI [Business Intelligence] teams overlook is the emotional change-management process that organizations have to go through on their way toward being truly data savvy and data informed. This involves a high level of customer interaction, customer obsession, and a willingness help people through the change process.
Why do you think HR departments and so many companies are only now making strides toward full data maturity?
Well, this will show my bias as a technology and product guy, but historically HR departments have solved issues using processes and people, not technology. That isn’t to say they are opposed to using data, but data science hasn’t been a core competency or been a competency demanded by the business.
HR is much in the same place that IT was 10 years ago: HR is transitioning from purely functional pieces of an organization to parts that actually drive value, scale, and growth. IT is ahead on this; IT is consistently seen as force multiplier and deployed to other parts of the business. That’s the direction HR is moving.
What do you wish non-initiated folks knew about data analytics? If you had a bit of advice for getting started, what would it be?
Data science isn’t magic. And algorithms aren’t pointer dogs that miraculously find insights in swaths of data. Data analytics won’t do your thinking for you (yeah, yeah, I know that Google’s neural network beat the international Go Champion). Yes, the user experience can be magical, but the process to create that experience is extraordinarily non-trivial. Also, an algorithm or dashboard won’t fix bad data. As the saying goes, you can put lipstick on a pig….
For folks getting started, focus on research methods and research design. These are skills taught in social sciences and are highly, highly undervalued. Research design is about answering the question “how do you design a particular data study and data collection to answer a particular question?” If you approach this right, research design will actually tell you the analysis you need to do and tell you what kind of structure you need the data in.
Too often, I’ve been to meetings where someone says, “Here’s a bunch of data; do something sexy with it.” But that’s backwards, post-hoc analysis. If we would have started with the business (research) question and worked backward, we would have been able to deliver much more value. Starting with the research question and working backwards is what research designs and methods force you to do. Sure, algorithms are going help you solve for a lot of things, but you have to know what in the world you are solving for in the first place.
That’s the biggest piece that’s missing when I interview people—the fundamental understanding of how to design data collections in a way that lets you actually answer a question instead of just taking data and trying to fit it to an approach after the fact.
Tell me what you think is the most interesting issue facing the data science industry today.
There are a couple really big-picture things I find interesting—one of them is “techno-ethics,” and that has to do with the ethical implications of technology. Here’s a dark example: you’re in a self-driving car, the car is going to crash and it can crash into a cyclist with a helmet, a cyclist without, or run into a cement wall. Who should the self-driving car kill if it has to kill someone?—the passenger or the cyclist? When we’re talking about machines that can make decisions like this, it is critical to include ethics in the conversation. This also connects back to a point I made earlier, about data science and analytics not being magic. The ethical decision-making frame and the assumptions we bring to products we create matter—a lot. And a self-driving car is just one example.
The other one on my mind right now, given current economic and global climates—is how to use data to get people to act in their best interests. Not to manipulate people—but act in health. For example, how do you get a population to support public health initiatives that benefit them and their community?
I think those are some big areas that analytics is going to play in, and I’m hopeful that analytics professionals rise to the challenge and engage in an intentional manner, with an eye on having a positive impact.