S3-Compatible-Instagram-Scraper

Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows. Latest Commit Repo Size GitHub Followers

Overview

This project is an extension of the Instagram scraper built by rarcega.

It is designed to organize the scraped instagram data neatly in AWS S3, according to this structure:

S3_BUCKET_NAME/
|
|-- instagram/
   |-- TARGET_USER
      |-- full-metadata.json: Contains metadata for entire operation
      |-- [POST_ID_X]
         |-- [POST_ID_X].jpg: Image of the post
         |-- summary.json: Key information associated with post
      |-- [POST_ID_Y]
         |-- [POST_ID_Y].jpg
         |-- summary.json
      | ...

Getting Started

Prerequisites

These instructions were designed for Ubuntu 18.04.

You will need to create a config.py file with the following contents:

AWS_ACCESS_KEY_ID = [YOUR AWS_ACCESS_KEY_ID]
AWS_SECRET_ACCESS_KEY = [YOUR AWS_SECRET_ACCESS_KEY]
AWS_REGION_NAME = [YOUR AWS_REGION_NAME]
S3_BUCKET_NAME = [YOUR AWS_S3_BUCKET_NAME]
INSTAGRAM_USER_ID = [YOUR INSTAGRAM_USER_ID]
INSTAGRAM_USER_PASSWORD = [YOUR INSTAGRAM_USER_PASSWORD]
TARGET_INSTAGRAM_USER = [YOUR TARGET_INSTAGRAM_USER TO SCRAPE DATA FROM]

A config_template.py file has been provided for your convenience.

Now, follow these instructions to get the variables above.

NOTE: Your userId and password are required to scrape data from private users followed by you.

Installation

  1. Clone this repository.
    git clone https://github.com/Jordan396/S3-Compatible-Instagram-Scraper.git
    cd S3-Compatible-Instagram-Scraper/
    
  2. Create a venv and activate it.
    python3 -m venv venv
    source venv/bin/activate
    
  3. Install dependencies.
    pip install -r requirements.txt
    
  4. Add your config.py above to the base directory.
  5. Start scraping!
    python scrape.py
    
  6. Navigate to your S3 bucket to view the scraped data.