⚠️Large suppression files

How to handle massive suppression files appropriately

Instead, we recommend running a script or tool outside of the platform.

Use a tool that can output a file of only the matches between your contact list and the suppression file, and then upload that output file as a suppression list instead.

This is the type of task the free version of ChatGPT 3.5 is perfectly suited for, as shown below.

Here's the prompt I used on the free version of ChatGPT to generate a script that runs on my macbook pro that will process a large suppression file in md5sum format:

I need to create a python script that can run on osx which takes a plain text file and converts each line to an md5sum, then compares each newly md5sumed line of that file to each line of a second plain text file which already consists of md5sums, I then need the matching lines from the first file to be saved to a file in the lines original format before they were converted to md5 for the match. The pre-existing md5sums file is 812MB and over 23 million lines, so this script needs to handle large file sizes while completing quickly

import hashlib
import sys

def main():
    # get the filenames from the command line arguments
    file1 = sys.argv[1]
    file2 = sys.argv[2]
    output_file = "matching_lines.txt"

    # read the md5sums from file2 and store them in a set
    md5sums = set()
    with open(file2, "r") as f:
        for line in f:
            md5sums.add(line.strip())

    # loop over each line in file1 and compare its md5sum to the md5sums in the set
    with open(file1, "r") as f, open(output_file, "w") as out:
        for line in f:
            md5 = hashlib.md5(line.encode()).hexdigest()
            if md5 in md5sums:
                out.write(line)

if __name__ == "__main__":
    main()

The above script was tested on millions of records, completed in a few seconds, and produced accurate output

Here's another example one liner with AWK in macOS terminal that can process two files of email addresses in regular plain unhashed format:

awk -F, 'FNR==NR {a[$1]; next}; $1 in a' suppression.csv contacts.csv

To export multiple contact lists together as one file to make this process efficient, add them to a segment and export the segment instead

If you need any help with this topic, feel free to open a support ticket

Last updated

Was this helpful?