Я очищаю веб-сайт электронной коммерции и возвращает пустой список.
Это код, который я написал.
import requests
from bs4 import BeautifulSoup
baseurl = 'https://www.thewhiskyexchange.com/'
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36 Edg/126.0.0.0'}
r = requests.get('https://www.thewhiskyexchange.com/c/35/japanese-whisky')
soup = BeautifulSoup(r.content, 'lxml')
productlist = soup.find_all("li", class_ = "product-grid__item")
print(productlist)
это результат, который я получаю
[]
Как уже упоминалось в комментарии, вас заблокируют. Однако вы можете обойти это, используя Playwright.
Установите пакет playwright
, а затем установите Chromium.
pip3 install playwright
playwright install chromium
Код ниже будет получать имена и URL-адреса продуктов, выбранных с помощью предоставленного вами CSS.
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
URL = "https://www.thewhiskyexchange.com/c/35/japanese-whisky"
playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=False, slow_mo=2000)
context = browser.new_context(
viewport = {"width": 1280, "height": 900}
)
page = context.new_page()
page.goto(URL)
page.locator(".product-grid__list").wait_for()
soup = BeautifulSoup(page.content(), 'lxml')
for product in soup.select("li.product-grid__item > a"):
name = product.select_one("p.product-card__name").text
url = product["href"]
print(f"{name} | {url}")
page.close()
Если вы не хотите видеть окно браузера, установите headless=True
.
Пример вывода:
Hibiki Harmony | /p/29388/hibiki-harmony
Nikka Coffey Grain Whisky | /p/23928/nikka-coffey-grain-whisky
Hakushu Distiller's Reserve | /p/23771/hakushu-distillers-reserve
Yamazaki 12 Year Old | /p/2940/yamazaki-12-year-old
Suntory Toki | /p/36362/suntory-toki
Yamazaki Distiller's Reserve | /p/23772/yamazaki-distillers-reserve
Ichiro's Malt MWRMizunara Wood Reserve | /p/46186/ichiros-malt-mwr-mizunara-wood-reserve
Chichibu Red Wine Cask 2023 | /p/80949/chichibu-red-wine-cask-2023
Kanosuke Single Malt | /p/72178/kanosuke-single-malt
Hibiki 21 Year Old | /p/10134/hibiki-21-year-old
Fuji Single Malt Whisky | /p/72434/fuji-single-malt-whisky
Yamazaki 18 Year OldGift Box | /p/81705/yamazaki-18-year-old-gift-box
The Chita Distiller's Reserve | /p/44794/the-chita-distillers-reserve
Yoichi Single Malt | /p/32761/yoichi-single-malt
Kaiyo Mizunara Oak Cask Strength | /p/45367/kaiyo-mizunara-oak-cask-strength
Mars Tsunuki2022 Edition | /p/71682/mars-tsunuki-2022-edition
Kanosuke Single Malt 2022 Limited Edition | /p/71032/kanosuke-single-malt-2022-limited-edition
Nikka Miyagikyo PeatedDiscovery Series 2021 | /p/61449/nikka-miyagikyo-peated-discovery-series-2021
Miyagikyo Single Malt | /p/32762/miyagikyo-single-malt
Yamazaki 12 Year Old100th Anniversary | /p/71847/yamazaki-12-year-old-100th-anniversary
Hakushu 12 Year Old | /p/2922/hakushu-12-year-old
Chichibu The Peated 2022 | /p/70564/chichibu-the-peated-2022
Kanosuke Double Distillery Blended Whisky | /p/81245/kanosuke-double-distillery-blended-whisky
Kanosuke Hioki Pot Still | /p/81244/kanosuke-hioki-pot-still
Распечатайте
soup
и вы увидите проблему. Они используют обнаружение, чтобы предотвратить то, что вы пытаетесь сделать.