Database Seminar

The database seminar is a weekly gathering of students and faculty interested in database systems. Each meeting lasts for an hour and consists of presentations of recent research papers, practice talks, or invited talks. Local speakers may choose to present their own research work or recent publications from database conferences or related fields. Students interested in database systems are strongly encouraged to attend the seminar.

Current Schedule

  • May 8 2017
    TBD
    Pradap KV
    2310 CS 12:15 PM
    Abstract :

    TBD


    Speaker Bio :

    TBD


  • April 17 2017
    Accelerating Linear Algebra over Normalized Data
    Lingjiao Chen
    2310 CS 12:15 PM
    Abstract :

    Providing machine learning (ML) over relational data is a mainstream requirement for data management systems. Recently, there is growing interest in optimizing the performance of ML by exploiting the redundancy generated by joins, which is called ''factorized learning''. However, the existing work requires a manual rewrite of each ML implementation, which creates a massive development overhead when extending factorized learning to other ML algorithms.

    In this talk, I will discuss how to mitigate this overhead by factorizing a common formal language to represent the computations of many ML algorithms - linear algebra. We introduce a new logical data type to represent normalized data and devise a framework of algebraic rewrite rules to convert a large set of linear algebra operations over denormalized data into operations over normalized data. This enables us to automatically "factorize" several popular ML algorithms, thus unifying and generalizing several prior works. We prototype our framework in the popular ML environment R and an industrial R-over-RDBMS tool. Experiments with both synthetic and real normalized data show that our framework also yields significant speed-ups, up to 36x on real data.

    Joint work with Arun Kumar, Jeff Naughton and Jignesh M. Patel.


    Speaker Bio :

    Lingjiao is a graduate student in University of Wisconsin Madison. His research interest spans in data management (especially advanced data analytics), as well as machine learning and optimization. He is particularly interested in accelerating data analytics workload from both theory and system perspectives.


  • April 10 2017
    A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries
    Bas Ketsman
    2310 CS 12:15 PM
    Abstract :

    This talk presents results about the worst-case optimal communication cost for parallel evaluation of conjunctive queries. Particularly, we describe a multi-round algorithm that computes any conjunctive query with load m/p^{1/\rho*} per server, in the case where all input relations are binary. Here, \rho* denotes the fractional edge covering number of the query hypergraph, m is the maximum size of relations, and p is the number of servers.

    In this setting two main prior results where known - First, for one-round algorithms over skew-free data, the optimal communication cost per server is m/p^{1/\tau*}, where m is the size of the largest input relation, and \tau* is the fractional vertex covering number of the query Hypergraph. Our algorithm exploits this result to obtain the mentioned bound when no skew is present. Second, it has been proven recently that for multi-round algorithms and unrestricted database instances, any algorithm requires at least m/p^{1/\rho*} communication cost per server. Hence, this algorithm is essentially optimal.

    Joint work with Dan Suciu.


    Speaker Bio :

    Bas Ketsman is a Ph.D. candidate at Hasselt University, Belgium, advised by Prof. Frank Neven. He is a PhD Fellow of the Research Foundation Flanders (FWO) and a member of the research group Databases and Theoretical Computer Science. His research interests include database theory and cloud computing.


  • April 3 2017
    Dhalion - Self-Regulating Stream Processing in Heron
    Avrilia Floratou
    2310 CS 12:15 PM
    Abstract :

    In recent years, there has been an explosion of large-scale real-time analytics needs and a plethora of streaming systems have been developed to support such applications. These systems are able to continue stream processing even when faced with hardware and software failures. However, these systems do not address some crucial challenges facing their operators - the manual, time-consuming and error-prone tasks of tuning various configuration knobs to achieve service level objectives (SLOs) as well as the maintenance of SLOs in the face of sudden, unpredictable load variation and hardware or software performance degradation. In this talk, we introduce the notion of self-regulating streaming systems and the key properties that they must satisfy. We then present the design and evaluation of Dhalion, a system that provides self-regulation capabilities to underlying streaming systems. We describe our implementation of the Dhalion framework on top of Twitter Heron, as well as a number of policies that automatically reconfigure Heron topologies to meet throughput SLOs, scaling resource consumption up and down as needed. We are in the process of open-sourcing Dhalion as part of the Heron project.


    Speaker Bio :

    Avrilia Floratou is a Senior Scientist at Microsoft's Cloud and Information Services Lab (CISL). Her research interests broadly lie in the area of data management with a recent focus on large-scale stream processing and real-time analytics. She is also an active contributor to Heron, collaborating with Twitter. Prior to her current role, she was a research scientist at IBM Almaden Research Center working on SQL-on-Hadoop engines and incorporating her research in the IBM Big SQL product offering. She received her Ph.D. and M.Sc. in Computer Science from University of Wisconsin-Madison and her B.S from University of Athens in Greece.


  • March 27 2017
    Detecting Errors in Labelled Data For Entity Matching
    Haojun Zhang
    2310 CS 12:15 PM
    Abstract :

    Labelled data are widely used in many areas, especially in tasks that use supervised machine learning. For example, in the entity matching pipeline, labelled data are used to debug the blocking step, train and evaluate matchers and the whole pipeline. With the emerging of deep learning, we can expect more and more large-size labelled datasets. However, labelled data are usually collected by manual approaches, automatic or semiautomatic approaches. Such approaches are error-prone, which in turns affects the usage of labelled data for various tasks.

    In this talk, I will present an ongoing research project about detecting label errors in such datasets for entity matching. We develop an iterative tool to help the analyst find and correct the label errors. In each iteration, our tool will search for suspicious labels, rank them and return the top-k to the analyst for manual verification. Then our tool uses the verified labels as feedback to update the searching and ranking for next iterations. Our experiments on both real and synthetic datasets show that our tool can help the analyst find many label errors by only verifying a small subset of examples in the dataset. This is a joint work with Fatemah Panahi, AnHai Doan, Jeff Naughton.


    Speaker Bio :

    Haojun is a graduate student in the Database Systems Group working with Prof. AnHai Doan. His primary research interest is in entity matching and data cleaning, focusing on how to combine techniques such as machine learning, optimization and crowdsourcing to improve entity matching and data cleaning.


  • March 13 2017
    THEMIS - Fairness in Federated Stream Processing under Overload (SIGMOD 2016)
    Harshad Deshmukh
    2310 CS 12:15 PM
    Abstract :

    Federated stream processing systems, which utilise nodes from multiple independent domains, can be found increasingly in multi-provider cloud deployments, internet-of-things systems, collaborative sensing applications and large-scale grid systems. To pool resources from several sites and take advantage of local processing, submitted queries are split into query fragments, which are executed collaboratively by different sites. When supporting many concurrent users, however, queries may exhaust available processing resources, thus requiring constant load shedding. Given that individual sites have autonomy over how they allocate query fragments on their nodes, it is an open challenge how to ensure global fairness on processing quality experienced by queries in a federated scenario.

    We describe THEMIS, a federated stream processing system for resource-starved, multi-site deployments. It executes queries in a globally fair fashion and provides users with constant feedback on the experienced processing quality for their queries. THEMIS associates stream data with its source information content (SIC), a metric that quantifies the contribution of that data towards the query result, based on the amount of source data used to generate it. We provide the BALANCE-SIC distributed load shedding algorithm that balances the SIC values of result data. Our evaluation shows that the BALANCE-SIC algorithm yields balanced SIC values across queries, as measured by Jain’s Fairness Index. Our approach also incurs a low execution time overhead.


    Speaker Bio :

    Harshad is a PhD candidate in the Database Systems Group working with Prof. Jignesh M. Patel on the Quickstep project. His primary research interest is in resource management for in-memory database systems, specifically in designing query schedulers.


  • February 6 2017
    LIP - Looking Ahead Makes Query Plans Robust
    Jianqiao Zhu, Navneet Potti
    2310 CS 12:15 PM
    Abstract :

    Query optimizers and query execution engines cooperate to deliver high performance on complex analytic queries. Typically, the optimizer searches through the plan space and sends a selected plan to the execution engine. However, optimizers may at times miss the optimal plan, with some- times the disastrous impact on performance. In this paper, we develop the notion of robustness of a query evaluation strategy with respect to a space of query plans. We also propose a novel query execution strategy called Lookahead Information Passing (LIP) that is robust with respect to the space of (fully pipeline-able) left-deep query plan trees for in-memory star schema data warehouses. LIP ensures that execution times for the best and the worst case plans are far closer than without LIP. In fact, under certain assumptions of independent and uniform distributions, any plan in that space is theoretically guaranteed to execute in near-optimal time. LIP ensures that the execution time for every plan in the space is nearly-optimal. We also evaluate these claims using workloads that include skew and correlation. With LIP we make an initial foray into a novel way of thinking about robustness from the perspective of query evaluation, where we develop strategies (like LIP) that collapse plan sub-spaces in the overall global plan space.


    Speaker Bio :

    Jianqiao Zhu is a 5th year Ph.D. student in the database group at UW-Madison, working with Prof Jignesh Patel. His research interests include query languages, query optimization, query processing system design and data visualization.

    Navneet Potti is a 4th-year graduate student at UW-Madison, working with Prof Jignesh Patel on the Quickstep project. His primary research interests broadly lie in all aspects of developing high-performance data analytics platforms, ranging from main memory database engines to machine learning. He is currently affiliated with Microsoft Gray Systems Lab and has spent summers doing internships at IBM Almaden as well as at Pivotal.


  • January 23 2017
    Organizational Meeting
    None
    2310 CS 12:15 PM
    Abstract :

    Decide the logistics of the spring 2017 database seminar.


Past Schedule

  • November 28 2016
    CyborgCleaner - Toward End-to-End Data Cleaning with Machine and Human
    Adel Ardalan
    2310 CS 12:15 PM
    Abstract :

    Most current data cleaning solutions solve only certain steps of the cleaning pipeline, not the entire pipeline. For example, many cleaning pipelines consist of two parts - a machine part that employs a cleaning algorithm, and a human part that employs a user to manually clean up the output of the algorithm. Current work has addressed only the machine part. This raises three serious problems. First, without guidance, users often end up doing something ad-hoc, suboptimal, and incorrect. Second, the machine part may optimize for the wrong goal, causing more work in the human part. Finally, focusing only on the machine part makes it very difficult to address the brittleness of the cleaning algorithm.

    To address these problems, we argue for more attention to end-to-end data cleaning, focusing not just on the machine, but also on the human part. We propose an RDBMS-style solution strategy, which defines basic human operations, combines them with cleaning algorithms to form hybrid plans, estimates plan costs (e.g., in terms of human effort), then selects the best plan. As a case study, we examine the important problem of value normalization and develop an effective solution. Our work thus advocates for end-to-end data cleaning and shows that it is possible to apply an RDBMS-style approach to build complex machine-human solutions to this problem.


  • November 21 2016
    Design of Arbitrage Free Data Pricing Schemes
    Shaleen Deep
    2310 CS 12:15 PM
    Abstract :

    Motivated by a growing market that involves buying and selling data over the web, we study pricing schemes that assign value to queries issued over a database. Previous work studied pricing mechanisms that compute the price of a query by extending a data seller's explicit prices on certain queries, or investigated the properties that a pricing function should exhibit without detailing a generic construction. In this work, we present a formal framework for pricing queries over data that allows the construction of general families of pricing functions, with the main goal of avoiding arbitrage. We consider two types of pricing schemes - instance-independent schemes, where the price depends only on the structure of the query, and answer-dependent schemes, where the price also depends on the query output. Our main result is a complete characterization of the structure of pricing functions in both settings, by relating it to properties of a function over a lattice.


    Speaker Bio :

    Shaleen is a second year grad student working with Prof. Paris Koutris. His interest is in the area of database theory.


  • November 14 2016
    Artificial Intelligence and Machine Learning Innovation at Intuit
    Bharath Kabada, Michael J. Radwin and Pari Lingampally
    2310 CS 12:15 PM
    Abstract :

    Leaders from Intuit's Innovation & Advanced Technology Group look at the horizon in Artificial Intelligence and share what Intuit is doing in the space. With 37 million customers across personal finance, accounting, and tax products, Intuit is uniquely positioned to address this topic. Not only did Intuit use AI to create the "Tax Knowledge Engine", a system that understands the American tax code by analyzing thirty million tax returns facilitated through TurboTax annually, but also the company is exploring a wide variety of new modes of artificial intelligence and conversational user interfaces.


    Speaker Bio :

    Bharath Kadaba leads Intuit's Innovation and Advanced Technology group and serves as Intuit's Chief Innovation Officer. During Bharath's eight years with Intuit, he has led various teams across Intuit in exploring and creating right-for-me offerings for Global Small Business, Accountants and Consumers. Prior to Intuit, he was the SVP of Media Engineering at Yahoo, VP/GM of Application Integration at Siebel Systems, and led engineering at two high profile Internet startups in Silicon Valley. Early in his career he led research and development on Gigabit Internet Routers at the IBM T.J. Watson Labs. Bharath earned a Ph.D. in Information Theory and Networks from the University of Hawaii.

    Michael J. Radwin leads Intuit's Innovation & Advanced Technology Data Science team, where he turns forward-looking data ideas into working prototypes and products. Prior to Intuit, Radwin was VP Engineering of Anchor Intelligence, which used machine learning ensemble methods to fight online advertising fraud. Earlier, Radwin was Director of Engineering at Yahoo!, where he built ad-targeting and personalization algorithms with neural networks and naive Bayesian classifiers, and scaled web platform technologies Apache and PHP.

    Pari Lingampally is an alumnus of UW-Madison, who graduated with a degree in Computer Engineering in 2014. At Intuit, he has worked on multiple projects that ranged from a novel identity management system to an in-production algorithm used in TurboTax to help catch user typed errors. More recently, he has worked on an project that experimented with using Google's TensorFlow to do user prediction based on TurboTax clickstream data.


  • November 7 2016
    Database Systems Meet Non-Volatile Memory (NVRAM)
    Per-Ake (Paul) Larson
    2310 CS 12:15 PM
    Abstract :

    Byte addressable, non-volatile memory (NVRAM) with close to DRAM speed is becoming a reality. Low capacity DIMMs (10s of MBs) are already available and high capacity DIMMs (100s of GB) are expected in 2017. This talk is about how database systems, in particular, main-memory databases can benefit from NVRAM. It will begin with an outline of the characteristics of different types of NVRAM and how the operating systems provide applications access to NVRAM. Ensuring that data structures such as indexes in NVRAM can be recovered to a consistent state without data or memory loss after a crash is challenging. The talk will discuss what causes the difficulties and how they can be overcome. It will then show how NVRAM can be used to greatly reduce the latency of commit processing and replication. By storing a main-memory database, including indexes, in NVRAM it is possible to achieve near-instant recovery after a crash. The final part of the talk will discuss how this can be achieved.


    Speaker Bio :

    Paul has conducted research in the database field for over 35 years. He served as a Professor in the Department of Computer Science at the University of Waterloo for 15 years and as a Principal Researcher at Microsoft Research for close to 20 years. Paul is a Fellow of the ACM. He has worked in a variety of areas - file structures, materialized views, query processing, query optimization, column stores, and main-memory databases among others. Paul collaborated closely with the SQL Server team to drastically improve SQL Server performance by adding column store indexes, a novel main-memory engine (Hekaton), and support for real-time analytics.


  • October 28 2016
    Data Systems that are Easy to Design, Tune and Use
    Stratos Idreos
    2310 CS 11:00 PM
    Abstract :

    How far away are we from a future where a data management system sits in the critical path of everything we do? Already today we often go through a data system in order to do several basic tasks, e.g., to pay at the grocery store, to book a flight, to find out where our friends are and even to get coffee. Businesses and sciences are increasingly recognizing the value of storing and analyzing vast amounts of data. Other than the expected path towards an exploding number of data-driven businesses and scientific scenarios in the next few years, in this talk we also envision a future where data becomes readily available and its power can be harnessed by everyone. What both scenarios have in common is a need for new kinds of data systems that "just work"; they are easy to design, easy to tune, and easy to use. We will discuss this vision and our recent efforts towards 1) adaptive data systems that can adapt to data and access patterns on-the-fly, 2) self-designing data systems that make it easy to spin-off and test new data system architectures and 3) curious data systems that make it easy to explore data even if we do not know what queries to ask.


    Speaker Bio :

    Stratos Idreos is an assistant professor of Computer Science at Harvard University where he leads DASlab, the Data Systems Laboratory at Harvard SEAS. Stratos works on data system architectures with emphasis on how we can make it easy to design efficient data systems as applications and hardware keep evolving and on how we can make it easy to use these systems even for non-experts. For his doctoral work on Database Cracking, Stratos won the 2011 ACM SIGMOD Jim Gray Doctoral Dissertation award and the 2011 ERCIM Cor Baayen award. He is also a recipient of an IBM zEnterpise System Recognition Award, a VLDB Challenges and Visions best paper award and an NSF Career award. In 2015 he was awarded the IEEE TCDE Early Career Award from the IEEE Technical Committee on Data Engineering for his work on adaptive data systems.


  • October 24 2016
    Learning-based Cost Management for Cloud Databases
    Olga Papaemmanouil
    2310 CS 12:15 PM
    Abstract :

    Cloud computing has become one of the most active areas of computer science research, in large part because it allows computing to behave like a general utility that is always available on demand. While existing cloud infrastructures and services reduce significantly the application development time, significant effort is still required by cloud data management applications to manage their monetary cost, for often this cost depends on a number of decisions including but not limited to performance goals, resource provisioning and workload allocation. These tasks depend on the application-specific workload characteristics and performance objectives and today their implementation burden is left on the application developers.
    We argue for a substantial shift away from human-crafted solutions and towards leveraging machine learning algorithms to address the above challenges. These algorithms can be trained on application-specific properties and customized performance goals to automatically learn how to provision resources as well as schedule the execution of incoming query workloads with low cost. Towards this vision, we have developed WiSeDB, a learning-based cost management service for cloud-deployed data management applications. In this talk, I will discuss how WiSeDB uses (a) supervised learning toautomatically learn cost-effective models for guiding query placement, scheduling, and resource provisioning decisions for batch processing, and (b) reinforcement learning to offer low cost online processing solutions, while being adaptive to resource availability and decoupled from notoriously inaccurate performance prediction models.


    Speaker Bio :

    Olga Papaemmanouil is an Assistant Professor in the Department of Computer Science at Brandeis University since January 2009. Her research interest lies in the area of data management with a recent focus on cloud databases, data exploration, query optimization and query performance prediction. She is the recipient of an NSF Career Award (2013), an ACM SIGMOD Best Demonstration Award (2005) and a Paris Kanellakis Fellowship from Brown University (2002). She received a undergraduate degree in Computer Engineering and Informatics at the University of Patras, Greece, in 1999, a Sc.M. in Information Systems at the University of Economics and Business, in Athens, Greece, in 2002, and a Ph.D in Computer Science at Brown University, in 2008.


  • October 17 2016
    Self-diagnosing and self-healing indexes
    Goetz Graefe
    2310 CS 12:15 PM
    Abstract :

    Work done by - Goetz Graefe and Harumi Kuno at HP and Bernhard Seeger (U Marburg)
    The integrity of B-tree structures can become compromised for many reasons. Even if all programming errors in a database management system could be avoided, failures in the various layers of hardware and software beneath the database kernel can result in inconsistencies. Since these inconsistencies manifest themselves in unpredictable and sometimes disastrous ways, all commercial database management systems include mechanisms to verify the integrity and trustworthiness of an individual index and of a set of related indexes, and all vendors recommend index verification as part of regular database maintenance similar to regular backups.
    Instead, or in addition, we suggest verifying the B-tree structure in each traversal. With carefully designed tree structure and node contents, a root-to-leaf pass can verify all nodes along its path comprehensively, i.e., it can verify all B-tree invariants including its consistency constraints with respect to its siblings and neighbors. Thus, instead of verifying the B-tree structure and contents offline, i.e., instead of the traditional approach, normal application execution verifies the structure frequently and efficiently. Frequent verification and fast access to log records permits automatic, reliable, and efficient recovery of the correct, up-to-date page contents. Our contribution is an index structure that gives reliable and efficient access to all relevant log records and thus enables root cause analysis during testing and automatic recovery after deployment.


    Speaker Bio :

    Goetz Graefe recently joined Google's Madison office after almost 10 years with HP Labs, over 10 years with Microsoft SQL Server, and 7 years in academia. He graduated with an MS in 1984 and a PhD in 1987, both at UW-Madison. He has published about database query optimization, query execution, indexing, transactions, recovery, and concurrency control; and has meandered all over from the ivory tower to release management.
    The work presented today was done at HP Labs.


  • October 10 2016
    Speeding up Machine Learning using Graphs and Codes
    Dimitris Papailiopoulos
    4310 CS 12:15 PM
    Abstract :

    I will talk about three simple combinatorial ideas to speed up parallel and distributed learning algorithms. We will start off with serial equivalence in asynchronous parallel machine learning, its significance, and how we can efficiently achieve it using a recent result on graph phase transitions. We will continue on the issue of stragglers (i.e., slow nodes) in distributed systems; here, we will use erasure codes to robustify gradient based algorithms against delays. In our last example, we will reduce the high communication cost of data-shuffling in distributed learning, using the seemingly unrelated notion of coded caching. For all the above, we will see real world experiments that indicate how these simple ideas can significantly improve the speed and accuracy of large-scale learning.


    Speaker Bio :

    I am an assistant professor of Electrical and Computer Engineering at the University of Wisconsin-Madison, and a faculty affiliate with the Optimization group at the Wisconsin Institute for Discovery. My research lies in the intersection of machine learning, coding theory, and distributed systems. I am particularly interested in coordination-avoiding algorithms for large-scale machine learning, and on using erasure codes to speed up distributed computation.

    Before coming to Madison, I spent two wonderful years as a postdoc at UC Berkeley. At Berkeley, I was a member of the AMPLab and BLISS, and had the pleasure to collaborate with Kannan Ramchandran and Ben Recht, on several fun projects. I received my Ph.D. in 2014 from UT Austin, where I was fortunate to be advised by Alex Dimakis. Before UT, I spent 3.5 years as a grad student at USC. Before all that, I received my M.Sc. (2009) and ECE Diploma (2007) from the Technical University of Crete (TUC), located in the beautiful city of Chania.


  • October 3 2016
    Toward a System Building Agenda for Data Integration
    AnHai Doan
    2310 CS 12:15 PM
    Abstract :

    Data Integration (DI) has long been an important topic in data management, and will become even more so in the age of Big Data and data science. Most DI works so far have focused on developing algorithms. In this talk I argue that we should devote far more effort to building DI systems, in order to truly advance the field. Toward this goal, I begin by drawing on my recent industrial experience to discuss the limitations of current DI systems. Next, I propose an agenda to build a new kind of DI systems to address these limitations. These systems guide users through the DI workflow, step by step. They provide tools to address the ``pain points'' of the steps, and tools are built on top of the Python data science and big data ecosystem (PyData). I discuss how to foster an ecosystem of such tools within PyData, then use it to build DI systems for collaborative/cloud/crowd/lay user settings. As an example of such systems, I discuss ``hands-off'' DI systems, which solve the entire DI task using only crowdsourcing. Finally, I discuss ongoing work at Wisconsin, which suggests that these DI systems are highly promising and building them raises many interesting research challenges. If time permits, I will discuss how this DI agenda can be built on to develop a broader R&D agenda for data science.


  • September 26 2016
    Ava - From data to insights through conversations
    Rogers Jeffrey Leo John and Navneet Potti
    2310 CS 12:15 PM
    Speaker Bio :

    Rogers is a second year grad student working with Dr. Jignesh Patel. His interests are broadly in the areas of Machine Learning, NLP, and Data Science. He hopes to add some more to this bio during his time at Wisconsin.

    Navneet Potti is a 4th year graduate student at UW-Madison, working with Prof Jignesh Patel on the Quickstep project. His primary research interests broadly lie in all aspects of developing high performance data analytics platforms, ranging from algorithmic problems in approximate computation and robust query optimization to engineering challenges in storage and compute optimization. He also works on developing systems to simplify data management and democratize data analytics. He graduated from IIT Madras in 2010 and then worked for Goldman Sachs until 2013. He is currently affiliated with Microsoft Gray Systems Lab and has spent summers doing internships at IBM Almaden, working on their SQL-on-Hadoop system Big SQL, as well as at Pivotal, working on their Greenplum database.


  • September 19 2016
    Current research projects at UW-Madison
    Paris Koutris
    2310 CS 12:15 PM
    Abstract :

    In this talk, I will give a brief overview on two research topics- (1) How can we price relational data? (2) How can we design parallel algorithms for query processing tasks? If time permits, I will also talk about one of my favorite (and still open) problems- finding a fast parallel algorithm for computing connected components in a graph. Blackboard only!


    Speaker Bio :

    I am an assistant professor who joined the Department of Computer Sciences in fall 2015. My research focuses on the theoretical aspects of data management and, more specifically, on problems that arise in modern applications and systems for processing big data. I am particularly interested in theoretical models for parallel query processing, uncertainty in data management and pricing data. Prior to moving to Madison, I completed my Ph.D. in Computer Science and Engineering at the University of Washington, supervised by Dan Suciu. Before that, I received a Diploma in Electrical and Computer Engineering from the National Technical University of Athens and a M.Sc. Degree in Logic, Algorithms and Computation from the University of Athens.