This weekend I revisited an old project Using grep On A Large File. TL;DR I downloaded a 80+GB email and password text file dump. This probably includes your past passwords. You can find out at have i been pwned

Previously searches on this file were taking an hour or more using grep. After making optimizations I got it down to 20 minutes. I was proud of this but it was still not quite usable taking that long, especially if I wanted to make many searches. After more googling I found bgrep. This ended up being extremely confusing because I think that there are two different programs named bgrep. Fortunately bgrep did lead me to look which ultimately solved my problem.
Originally I thought that bgrep used the binary search algorith. I thought this solution would solve my issue since grep searches the file from beginning to end. To sort the data I used sort -S 95% -o sorted_passwords.txt unsorted_passwords.txt. The -S 95% helped speed up the process by allowing the command to use 95% of system memory. This command still took over 2 hours to complete on my 2017 macbook pro. After the data was sorted I attempted to get whatever version of bgrep that I installed on my system working. While I was researching bgrep I stumbled across the look command. If you do man look you will see “The look utility displays any lines in file which contain string as a prefix. As look performs a binary search, the lines in file must be sorted.”. This is perfect since I already have the data sorted. I can now type look joshsisto@gmail.com sorted_passwords.txt and it finds my email and password almost instantly! Using the time command I can see it takes on average .7 seconds to search the entire file.