DEV Community

Cover image for Data Structures and Algorithms on Strings
Veeranki Phani Sirisha
Veeranki Phani Sirisha

Posted on

3 1 1 1 1

Data Structures and Algorithms on Strings

Day 1: String Algorithms Overview

The first day of the ACM Winter School on DSA Strings began with an insightful lecture by Dr. Venkatesh Raman on essential string processing algorithms, including:

Asymptotic Analysis: Importance of Big-O notation and time/space complexity.
KMP Algorithm: Efficient pattern matching using the prefix function.
Boyer-Moore Algorithm: Focused on the bad character and good suffix rules.
Pattern Matching Techniques: Efficient substring search.
LCS & Edit Distance: Dynamic programming solutions for LCS and applications of edit distance in fields like bioinformatics.

Sriram Bhyravarapu led a hands-on tutorial where participants implemented the KMP and Boyer-Moore algorithms, practiced pattern matching, and explored dynamic programming with LCS and edit distance.

Day 2: Advanced String Algorithms

The second day focused on advanced topics such as wildcard matching, mismatches, and suffix trees:

Wildcard Matching: Algorithms for handling * and ? characters in patterns.
Pattern Matching with Mismatches: Techniques for approximate pattern matching.
Suffix Trees: Construction and applications in substring search, longest repeated substring, and lexicographical analysis.

Sriram Bhyravarapu guided the hands-on implementation of wildcard matching, approximate pattern matching, and suffix trees, giving participants practical experience with these
advanced techniques.

Day 3: Suffix Arrays and Their Applications

Dr. Sharma Thankachan's lecture focused on the fundamentals of suffix arrays, including their construction (Manber-Myers and Kasai's LCP algorithm) and applications in pattern matching, data compression, and bioinformatics.

Sriram Bhyravarapu guided participants through hands-on implementation of suffix arrays, LCP arrays, and applications like substring searches, longest repeated substring identification, and Burrows-Wheeler Transform (BWT) for compression tasks.

Day 4: Advanced String Data Structures and Queries

Dr. Thankachan discussed advanced applications of suffix trees and arrays in text indexing, bioinformatics, and data compression. He also introduced K-th lexicographical substring queries and Range Minimum Queries (RMQ).

Sriram led coding exercises on K-th queries, RMQs, and their integration with suffix structures. Participants worked on real-world problems involving substring ranking, pattern matching, and optimization.

The combination of theory and practical coding exercises provided participants with a solid understanding of advanced string algorithms for solving complex real-world problems.

Day 5: Rank-Select Operations, Wavelet Trees, and Applications

Lecture by Dr. Chirag Jain:
Introduced rank (count of occurrences) and select (position of K-th occurrence) operations.
Explained wavelet trees for efficient sequence representation and supporting queries.
Discussed real-world applications in text compression, pattern matching, and bioinformatics.

Tutorial by Sriram Bhyravarapu:
Hands-on implementation of rank-select operations and wavelet tree construction.
Solved practical problems like substring frequency and range queries.

Key Takeaways:
Gained practical skills in rank-select operations and wavelet trees for computational problems.

Day 6: Burrows-Wheeler Transform (BWT), FM-Index, and Applications

Lecture by Dr. Chirag Jain:
Introduced BWT for text compression and FM-Index for efficient substring matching.
Discussed applications in text indexing and genomic data analysis.

Tutorial by Sriram Bhyravarapu:
Implemented BWT and FM-Index, with exercises in pattern matching and text compression.

Key Takeaways:
Mastered BWT and FM-Index, essential for text processing and genomic analysis.

Day 7: Locality-Sensitive Hashing (LSH), Bloom Filters, and Applications

Lecture by Dr. Naveen Sivadasan:
Introduced LSH for approximate nearest neighbor searches and Bloom Filters for efficient set membership testing.
Discussed applications in image similarity, web indexing, and network security.

Tutorial by Sriram Bhyravarapu:
Implemented LSH and Bloom Filters, with exercises on query optimization and false positive rates.

Key Takeaways:
Learned advanced techniques for efficient data retrieval and set operations using LSH and Bloom Filters.

Day 8: Applications and Closing Ceremony

Session by Dr. Naveen Sivadasan:
Explored real-world applications of LSH and Bloom Filters in data deduplication, genomic data analysis, and recommendation systems.
Highlighted their role in bridging theoretical knowledge with practical industry challenges.

Closing Ceremony:
Reflected on the week-long program, emphasizing the advanced concepts, hands-on skills, and real-world applications covered.
Celebrated participants’ achievements and growth in tackling complex computational problems.

Key Takeaway:
The final day reinforced the practical relevance of advanced algorithms and marked a fitting conclusion to an intensive and enriching learning journey.

Certificate pic

Celebration

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more