(DI) has been a long-standing challenge in the data management community. So far the vast majority of DI works have focused on developing DI algorithms. Going forward, we argue that far more efforts should be devoted to building DI systems, in order to advance the field. DI is engineering by nature. We cannot just keep developing DI algorithms in a vacuum. At some point we must build end-to-end systems to evaluate the algorithms, to integrate research and development efforts, and to make practical impacts.
The question then is what kind of DI systems we should build, and how? In this direction we focus on identifying problems with current DI systems, then developing a radically new agenda for building DI systems. These new kinds of DI system have the following distinguishing characteristics:
1. They guide the user through the end-to-end DI workflow, step by step.
2. For each step, they provide automated or semi-automated tools to address the "pain points" of the step.
3. Tools seek to cover the entire DI workflow, not just a few steps as current DI systems often do.
4. Tools are being built on top of a data science and big data eco-system. Today the two most popular such eco-systems build on R and Python. We currently target the Python data science and big data eco-system.