- Регистрация
- 02.12.2014
- Сообщения
- 137
- Благодарностей
- 130
- Баллы
- 43
The parser includes two templates:
Performs a GET request on the video page, extracts the description and title.
Youtube ID Parser
To start, the ID of one video is fed in. Then the template parses the IDs of recommended videos.
First, it's necessary to parse the IDs. I managed to get around 402k in a week. Then, the template for parsing videos is launched. Ideally, both templates can work together - parsing IDs will always be faster than descriptions and titles.
Data is saved in a MySQL database because storing in files is slow and inefficient. You'll need to install the database itself and phpMyAdmin (optional).
Installation Method #1:
Download the installer from https://dev.mysql.com/downloads/installer/. For phpMyAdmin, PHP and a web server (Apache or nginx) are required. You can download ready-made LAMP bundles like Wamp, Xampp, etc.
Installation Method #2:
Install Docker Desktop. Prepare the docker-compose.yml file, navigate to the directory with this file, and execute the command
Create a table with the following structure
Create a unique index for youtube_id to avoid duplicates. And, of course, a primary index for auto-increment. Naturally, indexes need to be created before populating the table with data.
The database is ready. Add one entry manually with the first ID. Example: https://www.youtube.com/watch?v=nok4P9cYw_g - extract the ID. You can take any video and copy its ID to start. Then run Youtube ID Parser. Once at least the first 1000 are available, you can start Youtube Video Info Parser.
- Description and title parser
- Recommended video parser
Performs a GET request on the video page, extracts the description and title.
Youtube ID Parser
To start, the ID of one video is fed in. Then the template parses the IDs of recommended videos.
First, it's necessary to parse the IDs. I managed to get around 402k in a week. Then, the template for parsing videos is launched. Ideally, both templates can work together - parsing IDs will always be faster than descriptions and titles.
Data is saved in a MySQL database because storing in files is slow and inefficient. You'll need to install the database itself and phpMyAdmin (optional).
Installation Method #1:
Download the installer from https://dev.mysql.com/downloads/installer/. For phpMyAdmin, PHP and a web server (Apache or nginx) are required. You can download ready-made LAMP bundles like Wamp, Xampp, etc.
Installation Method #2:
Install Docker Desktop. Prepare the docker-compose.yml file, navigate to the directory with this file, and execute the command
docker-compose up
. After starting the containers, phpMyAdmin will be available at localhost:8080.Create a table with the following structure
Create a unique index for youtube_id to avoid duplicates. And, of course, a primary index for auto-increment. Naturally, indexes need to be created before populating the table with data.
The database is ready. Add one entry manually with the first ID. Example: https://www.youtube.com/watch?v=nok4P9cYw_g - extract the ID. You can take any video and copy its ID to start. Then run Youtube ID Parser. Once at least the first 1000 are available, you can start Youtube Video Info Parser.
- Тема статьи
- Парсинг
Вложения
-
390 байт Просмотры: 95
-
17,5 КБ Просмотры: 71
-
15,9 КБ Просмотры: 68