Modern Web Scraping with Python using Scrapy and Splash

Modern Web Scraping with Python using Scrapy and Splash

image description

What you will learn

  • Understand the fundamentals of Web Scraping

  • Understand Scrapy Architecture

  • Scrape websites using Scrapy

  • Understand Xpath

  • Extract and locate nodes from the DOM using XPath

  • Build a complete Spider from A to Z

  • Deploy Spiders to the cloud

  • Store the extracted Data in MongoDb

  • Understand how Splash Works

  • Scrape websites that relies on Javascript to render their content using Scrapy-Splash

  • Build a CrawlSpider

  • Understand the Crawling behavior

  • Build a custom Middleware

  • Web Scraping best practices

  • Avoid getting banned while scraping websites

  • Scrape APIs

  • Scrape infinite scroll websites

  • Deploy spiders locally

  • Deploy spiders to Heroku

  • Run spiders periodically

  • Prevent storing duplicated data

  • Deploy Splash to Heroku

  • Write Data to Excel files

  • Login to websites using Scrapy

  • Download images and files using Scrapy

  • Use Crawler with Scrapy

  • Add proxies to the CrawlSpider

  • Free proxies with Scrapy

Curriculum

Section 1: Introduction - UPDATED -

Section 2: XPath Selectors

Section 3: Build a Complete Spider from A to Z

Section 4: Writing a Custom Pipeline - Store the Data in MongoDb

Section 5: Scraping Javascript Websites using Splash

Section 6: The Crawl Spider

Section 7: Avoid Getting Banned

Section 8: Scraping APIs(REST API) - Infinite Scroll Pagination

Section 9: Hosting spiders for free - Exclusive -

Section 10: Writing data to Excel files

Section 11: Scrapy POST requests

Section 12: The Media Pipeline

Section 13: Paid and Free proxies with Scrapy/Splash

Course Description

Become an expert in web scraping and web crawling using Python 3, Scrapy and Scrapy Splash

Requirements

  • Basics of Python
  • Basics of HTML
  • Basics of Javascript
  • Internet access

Description

Web Scraping nowdays has become one of the hottest topics, there are plenty of paid tools out there in the market that don't show you anything how things are done as you will be always limited to their functionalities as a consumer.

In this course you won't be a consumer anymore, i'll teach you how you can build your own scraping tool ( spider ) using Scrapy.

You will learn:

  1. The fundamentals of Web Scraping

  2. How to build a complete spider

  3. The fundamentals of XPath

  4. How to locate content/nodes from the DOM using XPath

  5. How to store the data in JSONCSV... and even to an external database(MongoDb)

  6. How to write your own custom Pipeline

  7. Fundamentals of Splash

  8. How to scrape Javascript websites using Scrapy Splash

  9. The Crawling behavior

  10. How to build a CrawlSpider

  11. How to avoid getting banned while scraping websites

  12. How to build a custom Middleware

  13. Web Scraping best practices

  14. How to scrape APIs

  15. How to scrape infinite scroll websites

  16. Host spiders in Heroku for free

  17. Run spiders periodically with a custom script

  18. Prevent storing duplicated data

  19. Deploy Splash to Heroku

  20. Write data to Excel files

  21. Login to websites using FormRequest

  22. Download Files & Images using Scrapy

  23. Use Proxies with Scrapy Spider

  24. Use Crawlera with Scrapy & Splash

  25. Use Proxies with CrawlSpider

What makes this course different from the others, and why you should enroll ?

  • First, this is the most updated course. You will be using Python 3.6, Scrapy 1.5 and Splash 2.0

  • You will have an in-depth step by step guide on how to become a professional web scraper.

  • I'll show you how other courses scrape Javascript websites using Selenium and why shouldn't do it in their way.

  • You will learn how to use Splash to scrape Javascript websites and i can assure you won't find any tutorials out there that teaches how to really use Splash like i'll be doing in this course.

  • You will learn how to host spiders in Heroku as well as Splash(Exclusive).

  • You will learn how to create a custom script so spiders can run periodically without any intervention from you.

  • 30 days money back guarantee by Udemy

So whether you are a data analyst who wants to add web scraping to his tool set or someone else who wants to learn how to extract unstructured data from unstructured HTML web pages and then store back that data in a structured way to apply some data analysis on it then you are welcome to join this course.

**STUDENTS THOUGHTS ABOUT THIS COURSE **

"I was particularly looking for web scraping using XPATHs and this course is addressing that. It also covers dynamic paging. A proper mix of theory and practical. A must-have for those who wants to do web scraping . GREAT learning experience !!! ". By Hiran Kumar

"90% of what I was searching for!!! Great job!! Clear explanations and great communication with Ahmed". By Raylyson Estanista 

"Admed’s Web scraping course is awesome . His approach using Python with scrapy and splash works well with all websites especially those that make heavy use of JavaScript. Ahmed is a gifted educator: expert communicator, passionate, conscientious and accessible to his students. I highly recommend this course and any of Ahmed Rafik’s Udemy courses. ". By Richard Blackmon

"Great course, and a nice introduction to Scrapy (I'm someone with no Python experience whatsoever).". By I S

"Excellent course. Quick and thorough at the same time. Ahmed is incredibly responsive to the students and often replies to questions within minutes! Highest recommendation." By Robert Nolte

"That course is very good and explanation is crystal clear! The instructor is very supportive in case of questions. Highly recommended." By Shubina Ekaterina

"I like the course. Clear explanations and good comunication with Ahmed. All topics is interesting and full of information. I improved my skils in Scrapy. Author update course content by new videos. It's a big bonus) Explained more advance topics I never see in other courses. Thank you, Ahmed. Waiting for new videos)". By Ruslan Romanenko

Who this course is for:

  • Anyone who wants to scrape data from any website
  • Anyone who wants to learn Scrapy
  • Anyone who wants to automate the task of copying contents from websites
  • Anyone who wants to learn how to scrape Javascript websites using Scrapy-Splash
  • Anyone who wants to learn the basics of Xpath
  • Anyone who want to learn Scrapy Splash