# Large suppression files

{% hint style="danger" %}
**We recommend against uploading unnecessarily large suppression files directly into your platform.**&#x20;

Many advertisers now have suppression files that have grown to 1GB, 2GB, or even larger. The outcome of uploading files this large is unpredictable and contingent on many factors including the robustness of your hardware. &#x20;
{% endhint %}

{% hint style="info" %}
**Instead, we recommend running a script or tool outside of the platform.**&#x20;

Use a tool that can output a file of only the matches between your contact list and the suppression file, and then upload that output file as a suppression list instead.&#x20;

This is the type of task the free version of ChatGPT 3.5 is perfectly suited for, as shown below. &#x20;
{% endhint %}

Here's the prompt I used on the free version of ChatGPT to generate a script that runs on my macbook pro that will process a large suppression file in md5sum format:&#x20;

> I need to create a python script that can run on osx which takes a plain text file and converts each line to an md5sum, then compares each newly md5sumed line of that file to each line of a second plain text file which already consists of md5sums, I then need the matching lines from the first file to be saved to a file in the lines original format before they were converted to md5 for the match. The pre-existing md5sums file is 812MB and over 23 million lines, so this script needs to handle large file sizes while completing quickly

```
import hashlib
import sys

def main():
    # get the filenames from the command line arguments
    file1 = sys.argv[1]
    file2 = sys.argv[2]
    output_file = "matching_lines.txt"

    # read the md5sums from file2 and store them in a set
    md5sums = set()
    with open(file2, "r") as f:
        for line in f:
            md5sums.add(line.strip())

    # loop over each line in file1 and compare its md5sum to the md5sums in the set
    with open(file1, "r") as f, open(output_file, "w") as out:
        for line in f:
            md5 = hashlib.md5(line.encode()).hexdigest()
            if md5 in md5sums:
                out.write(line)

if __name__ == "__main__":
    main()
```

{% hint style="info" %}
The above script was tested on millions of records, completed in a few seconds, and produced accurate output&#x20;
{% endhint %}

Here's another example one liner with AWK in macOS terminal that can process two files of email addresses in regular plain unhashed format:

&#x20;`awk -F, 'FNR==NR {a[$1]; next}; $1 in a' suppression.csv contacts.csv`

{% hint style="info" %}
To export multiple contact lists together as one file to make this process efficient, add them to a segment and export the segment instead
{% endhint %}

{% hint style="info" %}
If you need any help with this topic, feel free to open a support ticket
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.emaildelivery.com/docs/contact-list-management/large-suppression-files.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
