Today, when an enterprise wants to use machine learning to solve a problem, they have to call in the cavalry. Even a simple problem requires multiple data scientists, machine learning experts, and domain experts to come together to agree on priorities and exchange data and information.
This process is often inefficient, and it takes months to get results. It also only solves the problem immediate at hand. The next time something comes up, the enterprise has to do the same thing all over again.
One group of MIT researchers wondered, “What if we tried another strategy? What if we created automation tools that enable the subject matter experts to use ML, in order to solve these problems themselves?”
For the past five years, Kalyan Veeramachaneni, a principal research scientist at MIT’s Laboratory for Information and Decision Systems, along with Max Kanter and Ben Schreck who began working with Veeramachaneni as MIT students and later co-founded machine learning startup Feature Labs, has been designing a rigorous paradigm for applied machine learning.
The team first divided the process into a discrete set of steps. For instance, one step involved searching for buried patterns with predictive power, known as “feature engineering.” Another is called “model selection,” in which the best modeling technique is chosen from the many available options. They then automated these steps, releasing open-source tools to help domain experts efficiently complete them.
In their new paper, “Machine Learning 2.0: Engineering Data Driven AI Products,” the team brings together these automation tools, turning raw data into a trustworthy, deployable model over the course of seven steps. This chain of automation makes it possible for subject matter experts — even those without data science experience — to use machine learning to solve business problems.
“Through automation, ML 2.0 frees up subject matter experts to spend more time on the steps that truly require their domain expertise, like deciding which problems to solve in the first place and evaluating how predictions impact business outcomes,” says Schreck.
Last year, Accenture joined the MIT and Feature Labs team to undertake an ambitious project — build an AI project manager by developing and deploying a machine learning model that could predict critical problems ahead of time and augment seasoned human project managers in the software industry.
This was an opportunity to test ML 2.0’s automation tool, Featuretools, an open-source library funded by DARPA’s Data-Driven Discovery of Models (D3M) program, on a real-world problem.
Veeramachaneni and his colleagues closely collaborated with domain experts from Accenture along every step, from figuring out the best problem to solve, to running through a robust gauntlet of testing. The first model the team built was to predict the performance of software projects against a host of delivery metrics. When testing was completed, the model was found to correctly predict more than 80 percent of project performance outcomes.
Using Featuretools involved a series of human-machine interactions. In this case, Featuretools first recommended 40,000 features to the domain experts. Next, the humans used their expertise to narrow this list down to the 100 most promising features, which they then put to work training the machine-learning algorithm.
Next, the domain experts used the software to simulate using the model, and test how well it would work as new, real-time data came in. This method also extends the “train-test-validate” protocol typical to contemporary machine-learning research, making it more applicable to real-world use. The model was then deployed making predictions for hundreds of projects on a weekly basis.
“We wanted to apply machine learning (ML) to critical problems that we face in the technology services business,” says Sanjeev Vohra, global technology officer, Accenture Technology. “More specifically, we wanted to see for ourselves if MIT’s ML 2.0 could help anticipate potential risks in software delivery. We are very happy with the outcomes, and will be sharing them broadly so others can also benefit.”
In a separate joint paper, “The AI Project Manager,” the teams walk through how they used the ML 2.0 paradigm to achieve fast and accurate predictions.
“For 20 years, the task of applying machine learning to problems has been approached as a research or feasibility project, or an opportunity to make a discovery,” says Veeramachaneni. “With these new automation tools it is now possible to create a machine learning model from raw data and put them to use — within weeks,” says Veeramachaneni.
The team intends to keep honing ML 2.0 in order to make it relevant to as many industry problems as possible. “This is the true idea behind democratizing machine learning. We want to make ML useful to a broad swath of people,” he adds.
In the next five years, we are likely to see an increase in the adoption of ML 2.0. “As the momentum builds, developers will be able to set up a ML apparatus just as they set up a database,” says Max Kanter, CEO at Feature Labs. “It will be that simple.”
MIT Laboratory for Information and Decision Systems
http://news.mit.edu/2018/ml-20-machine-learning-many-data-science-0306
Source link