Fully-Faltoo blog by Pratyush

Bio Twitter Screener

17th Dec. 2022

"Search everywhere" on Screener

We recently added a "search everywhere" feature on Screener. It allows us to do a full-text search on all the concall transcripts, announcements and key-insights of all the listed companies.

This was  our second attempt on it.

There are various ways to implement full-text search:
- Using full-text index in relational database such as MySQL / PostgreSQL
- Using services such as ElasticSearch, Algolia, TypeSense or Meilisearch
- Using text search databases such as Lucene or Tantivy

We tried full-text using MySQL in our first attempt. This time we shifted to Tantivy. These have been our learnings.

Why MySQL full-text search didn't work?

Using full-text search within MySQL offers a huge advantage. We don't need to "sync" the changes. The index is always upto date because all the updates, deletions and additions are made in the same database.

The problems are:
- "Phrase searches" are totally broken. It can take over 1 minute to run queries.
- Words less than 3 characters were not indexed. In our case, it missed important terms like "EV".
- It doesn't work across tables. Concalls, announcements and documents are separate tables with different schemas.
- It doesn't support stemming. We had a workaround, but the results were sometimes inaccurate.

Why we picked Tantivy?

Tantivy offers a much better full-text search experience. The queries are blazing fast and the results are more accurate. Stemming and search scores are provided out of box.

But why not ElasticSearch?

ElasticSearch and other such services provide the whole package. They run as a separate service. They include UI, document parsers, restful APIs, plugins, authentications... They offer the whole buffet when we only need a coffee.

In contrast, Tantivy does one thing well. It creates an index of text documents and do quick searches. It doesn't hog the ram. It doesn't run as a separate service. The search is performed only when we run a query. And it is blazing fast because it is written in Rust. It is the reimagined version of Lucene written from scratch.

Underneath, even ElasticSearch or Solr use Lucene. Above which they have a lot of complicated layers to provide an all-in-one solution. We needed something much simpler.

Leave a comment

Your email will not be published.