A guide to working with the Batch Compliance endpoints in the Twitter API v2

#twitter #api #tutorial

The batch compliance endpoints allow Twitter developers to to upload large datasets of Tweet or User IDs to get their compliance status in order to determine what data requires action in order to keep their datasets in compliance with Twitter's developer policy. In this guide, I will show you how to use these batch compliance endpoints in Python using the Tweepy library.

Typically, there are 4 steps involved in working with this endpoint:

Step 1: Creating a compliance job

First, you need to create a compliance job.

If you will be uploading a list of Tweet IDs, you can set the job_name as tweets.
If you will be uploading a list of USER IDs, you can set the job_name as users.

Note: You can have one concurrent job, per job_type at any time.

import tweepy

client = tweepy.Client('BEARER_TOKEN')

# Replace the job_name with appropriate name for your compliance job
response = client.create_compliance_job(name='job_name', type='tweets')

upload_url = response.data['upload_url']

download_url = response.data['download_url']

job_id = response.data['id']

print(upload_url)

Once your job is succesfully created, please note the following fields in the response for the created job:

upload_url which is the link where you will be uploading your file with Tweet or User IDs to
download_url which is the link where you will be downloading your compliance results from
id which is the unique ID for your compliance job, that you will use to check the status of the job

Step 2: Upload your dataset

Next, you will upload your dataset as a plain text file to the provided upload_url(from the previous step), with each line of the file containing a single Tweet ID or user ID. Replace file_path in the code below, with the appropriate path to your file that contains the Tweet or User IDs.

Note: The upload_url expires after 15 minutes.

import requests

# Replace with your job upload_url
upload_url = ''

# Replace with your file path that contains the list of Tweet IDs or User IDs, one ID per line
file_path = ''

headers = {'Content-Type': "text/plain"}


def connect_to_endpoint(url):
    response = requests.put(url, data=open(file_path, 'rb'), headers=headers)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.text


def main():
    response = connect_to_endpoint(upload_url)
    print(response)


if __name__ == "__main__":
    main()

Step 3: Check the status of your compliance job

Once you have uploaded your dataset, you can check the status of your job, which can be:

created
in_progress
failed
complete

Get all compliance jobs for a given type

To get all jobs for a specific type, run the following code:

import tweepy

client = tweepy.Client('BEARER_TOKEN')

# If your dataset has User IDs instead, replace the type with users instead of tweets
jobs = client.get_compliance_jobs(type='tweets')

for job in jobs.data:
    print(job)
    print(job['status'])

Get a specific compliance job by job ID

If you want to check the status of a specific job by job ID, run the following code. Replace job_id with your job_id from step 1.

import tweepy

client = tweepy.Client('BEARER_TOKEN')

# Replace the job_id below with your own job ID
job = client.get_compliance_job(id=job_id)

print(job.data['status'])

Step 4: Download the results

Once your job status is set to complete, you can download the results, by running the following code. Replace download_url with the download_url you got in step 1.

import requests

# Replace with your job download_url
download_url = ''


def connect_to_endpoint(url):
    response = requests.request("GET", url)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.text


def main():
    response = connect_to_endpoint(download_url)
    entries = response.splitlines()
    for entry in entries:
        print(entry)


if __name__ == "__main__":
    main()

Note: The download_url expires after one week (from when the job was created).

This result will contain a set of JSON objects (one object per line). Each object will contain a Tweet ID, the Tweet’s creation date (useful to locate Tweets organized by date), required action, the reason for the compliance action, and its date.

I hope this guide is helpful in understanding how you can work with the batch compliance endpoints in the Twitter API v2. If you have any questions, feel free to reach to me on Twitter @suhemparack