Univ.-Prof. Dipl. Ing. Nikolaus Augsten, PhD

Professor

Head of the Database Research Group

ORCID 0000-0002-3036-6201

Department of Computer Science

University of Salzburg

Jakob-Haringer-Str. 2 (Office 1.16)

5020 Salzburg, Austria

Tel: +43-(0)662-8044-6347

Fax: +43-(0)662-8044-172

Email: nikolaus.augsten@plus.ac.at

I was previously affiliated to the Faculty of Computer Science at the Free University of Bozen-Bolzano. In 2010/2011 I visited Prof. Alfons Kemper at Technische Universität München (TUM, Munich, Germany) for 6 months. In 2005/2006 I spent 6 months at Washington State University with Prof. Curtis Dyreson (now at Utah State University). I received my PhD from Aalborg University, Denmark, in 2008. My supervisor was Prof. Michael Böhlen (University of Zurich).

Main Research Interests

My current research interests include data-centric applications in database and information systems with a particular focus on approximate matching techniques for complex data structures, efficient index structures for distance computations, and similarity search in massive data collections. My research is triggered by problems that arise in concrete applications, for example, e-government and XML search engines.

Source Code Downloads

Set Similarity Join Algorithms: C++ source code for many set similarity join algorithms. We used this source code in our experimental evaluation of set similarity join algorithm (PVLDB 2016).
Approximate Tree Matching Library: pq-gram distance and other tree distances (Java source code).
Tree Edit Distance code: implementation of all important tree edit distance algorithms (RTED, Demaine, Klein, Zhang-Shasha) with detailed documentation (Java source code).
Repeatability package including source code, the Bolzano Address Tree dataset and all other datasets used in the paper The pq-Gram Distance between Ordered Labeled Trees (TODS 2010).

Courses at other Universities

Lab Database Systems (Free University of Bozen-Bolzano)

Similarity Search (Free University of Bozen-Bolzano) This course will discuss similarity search techniques for flat strings and hierarchical data (for example, XML). Selected methods will be presented, their effectiveness and efficiency will be discussed. Filtering techniques to improve the efficiency will be introduced. The students will implement similarity joins in a relational database management system.

Database Management and Tuning (Free University of Bozen-Bolzano) This course will give an in-depth understanding of the features that off-the-shelf database management systems offer, in particular with respect to system performance. This knowledge is used to tune the database system and its environment: dimension the hardware for the database system, write efficient queries, set effective indexes, communicate with the database efficiently, and diagnose performance problems.

Scalable Similarity Search Algorithms (Technische Universität München, WS 2010)

Approximation: Theory and Algorithms (Free University of Bozen-Bolzano) This course will discuss approximate matching techniques for flat strings and hierarchical data. Selected methods will be presented, their effectiveness and efficiency will be discussed. Filtering techniques to improve the efficiency will be introduced. The students will implement approximate matching techniques in a relational database management system.

Selected Publications

2024

Manuel Widmoser, Daniel Kocher, Nikolaus Augsten. Scalable Distributed Inverted List Indexes in Disaggregated Memory. Proc. ACM Manag. Data, 2(3), 171, 2024. PDF

2023

Daniel Ulrich Schmitt, Daniel Kocher, Nikolaus Augsten, Willi Mann, Alexander Miller. A Two-Level Signature Scheme for Stable Set Similarity Joins. Proc. VLDB Endow., 16(11), 2686–2698, 2023. PDF
George Papadakis, Marco Fisichella, Franziska Schoger, George Mandilaras, Nikolaus Augsten, Wolfgang Nejdl. Benchmarking Filtering Techniques for Entity Resolution. 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3-7, 2023, 653–666, 2023. PDF
Konstantin Emil Thiel, Daniel Kocher, Nikolaus Augsten, Thomas Hütter, Willi Mann, Daniel Ulrich Schmitt. FINEX: A Fast Index for Exact & Flexible Density-Based Clustering. Proc. ACM Manag. Data, 1(1), 71:1–71:25, 2023. PDF
Manuel Widmoser, Daniel Kocher, Nikolaus Augsten, Willi Mann. MetricJoin: Leveraging Metric Properties for Robust Exact Set Similarity Joins. 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3-7, 2023, 1045–1058, 2023. PDF
Pranay Mundra, Jianhao Zhang, Fatemeh Nargesian, Nikolaus Augsten. Koios: Top-k Semantic Overlap Set Search. 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3-7, 2023, 1531–1543, 2023. PDF

2020

Oksana Dolmatova, Nikolaus Augsten, Michael H. Böhlen. A Relational Matrix Algebra and its Implementation in a Column Store. Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, 2573–2587, 2020. PDF

2019

Daniel Kocher, Nikolaus Augsten. A Scalable Index for Top-k Subtree Similarity Queries. Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, 1624–1641, 2019. PDF
Thomas Hütter, Mateusz Pawlik, Robert Loschinger, Nikolaus Augsten. Effective Filters and Linear Time Verification for Tree Similarity Joins. 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019, 854–865, 2019. PDF

2018

Fabian Fier, Nikolaus Augsten, Panagiotis Bouros, Ulf Leser, Johann-Christoph Freytag. Set Similarity Joins on MapReduce: An Experimental Survey. Proc. VLDB Endow., 11(10), 1110–1122, 2018. PDF

2016

Willi Mann, Nikolaus Augsten, Panagiotis Bouros. An Empirical Evaluation of Set Similarity Join Techniques. Proc. VLDB Endow., 9(9), 636–647, 2016. PDF

2015

Mateusz Pawlik, Nikolaus Augsten. Efficient Computation of the Tree Edit Distance. ACM Trans. Database Syst., 40(1), 3:1–3:40, 2015. PDF

2014

Nikolaus Augsten, Armando Miraglia, Thomas Neumann, Alfons Kemper. On-the-fly token similarity joins in relational databases. International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, 1495–1506, 2014. PDF

2012

Benjamin Gufler, Nikolaus Augsten, Angelika Reiser, Alfons Kemper. Load Balancing in MapReduce Based on Scalable Cardinality Estimates. IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1-5 April, 2012, 522–533, 2012. PDF
Nikolaus Augsten, Michael H. Böhlen, Curtis E. Dyreson, Johann Gamper. Windowed pq-grams for approximate joins of data-centric XML. VLDB J., 21(4), 463–488, 2012. PDF
Sven Helmer, Nikolaus Augsten, Michael H. Böhlen. Measuring structural similarity of semistructured data based on information-theoretic approaches. VLDB J., 21(5), 677–702, 2012. PDF

2011

Mateusz Pawlik, Nikolaus Augsten. RTED: A Robust Algorithm for the Tree Edit Distance. Proc. VLDB Endow., 5(4), 334–345, 2011. PDF
Nikolaus Augsten, Denilson Barbosa, Michael H. Böhlen, Themis Palpanas. Efficient Top-k Approximate Subtree Matching in Small Memory. IEEE Trans. Knowl. Data Eng., 23(8), 1123–1137, 2011. PDF

2010

Nikolaus Augsten, Denilson Barbosa, Michael H. Böhlen, Themis Palpanas. TASM: Top-k Approximate Subtree Matching. Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, March 1-6, 2010, Long Beach, California, USA, 353–364, 2010. PDF
Nikolaus Augsten, Michael H. Böhlen, Johann Gamper. The \emphpq-gram distance between ordered labeled trees. ACM Trans. Database Syst., 35(1), 4:1–4:36, 2010. PDF

2008

Nikolaus Augsten, Michael H. Böhlen, Curtis E. Dyreson, Johann Gamper. Approximate Joins for Data-Centric XML. Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, 2008, Cancún, Mexico, 814–823, 2008. PDF

2006

Nikolaus Augsten, Michael H. Böhlen, Johann Gamper. An Incrementally Maintainable Index for Approximate Lookups in Hierarchical Data. Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006, 247–258, 2006.

2005

Nikolaus Augsten, Michael H. Böhlen, Johann Gamper. Approximate Matching of Hierarchical Data Using pq-Grams. Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005, 301–312, 2005. PDF