Information retrieval from the Internet

Instructor: Dr Slavko Žitnik
Date and time: 25 May 2026, 11:00 – 15:00
Location: Faculty of Computer and Information Science (Večna pot 113, 1000 Ljubljana), Lecture room P03
Level: Beginning – intermediate. No prior knowledge is required. However, a basic understanding of website structure and/or programming in Python is desirable.

An enormous amount of data is published online. Large companies also collect this data and use it to build large language models. However, for specific analyses we may only need data from a particular part of the internet, so it is important to know how to obtain such data as efficiently as possible. We will explore the ways in which data is structured on the web, as well as tools that enable the automatic crawling and extraction of data from websites. For more advanced users, we will also demonstrate the possibility of using similar tools through programming.

Outcomes: Knowledge about how data is structured on the web, as well as tools that enable the automatic crawling and extraction of data from websites.
Skills you will gain:

Knows the types and formats in which content is presented on websites.
Understands the concept of web scraping.
Knows and uses tools for the automated extraction of data from the web.
Builds a simple workflow for obtaining data from the web.
Ability to assess the possibilities of obtaining data from the web.
Ability to collect data from the web.

Language: Slovenian (or English in case of foreign applicants)