Gathering Open Data: Web Scraping and APIs

Lecturer Jun.-Prof. Dr. Tristan Becker
TU Dresden, Junior Professor in Business Administration, esp. Management Science
Date 10./11.09.2024
with classes from 09:00 am – 04:00 pm
Room/Address TU Dresden
Georg Schumann-Bau (SCH/B37)
Seminar content The internet contains vast amounts of open data. In most cases, it is impossible to simply download the desired data in a structured form, but the data is distributed across many web pages. Web Scraping is a technique to automate the extraction of desired data. There are numerous applications, such as gathering price data from online shops, collecting information from social media websites like Facebook or Twitter, gathering data from job networks, and collecting general information on sports results or movie scores. By applying web scraping, the data from a large number of web pages can be quickly collected and saved in a structured data set. The data holds potential for all kinds of research projects using, e.g., statistical or optimization methods.
In this course, we will explore the fundamentals of web scraping with Python 3. We will learn how to access APIs with Python and look at the basics of web scraping. This includes an overview of fundamental elements that make up websites, libraries for web scraping (such as requests, Beautiful Soup, Scrapy, Selenium Webdriver), and a brief discussion about data storage. Further, we will examine some examples of scraping real websites.
Prerequisites We recommend basic programming skills in Python 3.
Certificate Ph.D. students from the Faculty of Business and Economics, TU Dresden can earn a certificate according to § 9 of the Ph.D. doctoral regulations (PromO 2018):
Ph.D. students of Business Administration: § 9 (1) Nr. 5 or 6
Ph.D. students of Business Information Systems: § 9 (1) Nr. 6
Ph.D. students of Economics: § 9 (1) Nr. 6

Ph.D. students from other universities can earn a certificate as well.
Assignment Students have to complete a brief web scraping assignment by picking a website and applying the Web Scraping skills from this course to compile a data set (e.g., collect weather data, sports results, or price data). They must submit both their code and data.
Registration To register send an e-mail to Dr. Uta Schwarz: uta.schwarz@tu-dresden.de
Phone: +49 351 463-33141