Scraping.Scraper
index
c:\users\adamj\onedrive\pulpit\olx-scrapper\src\scraping\scraper.py

 
Modules
       
aiohttp
asyncio
bs4
json
os
pandas
re

 
Classes
       
builtins.object
Scraper

 
class Scraper(builtins.object)
    Scraper(url_strings: list[src.Scraping.URLBuilder.URLBuilder], page_limit: int) -> None
 

 
  Methods defined here:
__init__(self, url_strings: list[src.Scraping.URLBuilder.URLBuilder], page_limit: int) -> None
Scraper class for scraping data from OLX.
:param url_strings: List of URLBuilder objects for scraping data.
:param page_limit: Limit of pages to scrape for each URL.
add_url(self, url: src.Scraping.URLBuilder.URLBuilder) -> None
Adds a URLBuilder object to the list of URLs to scrape.
:param url: URLBuilder object to add.
:return:
find_count(self, soup: bs4.BeautifulSoup) -> int
Finds the number of listings on the page.
:param soup: Soup object to search for the count.
:return: Number of listings on the page.
load_scraping_history(self) -> list[dict[str, typing.Union[str, datetime.datetime]]]
Loads the scraping history from the scraping history file.
:return: List of scraping history entries.
save_scrape_date(self) -> None
Saves the date of the last scrape to the scraping history file.
:return:
async scrape_data(self, progress_callback: Callable[[int], NoneType] = None) -> dict[str, pandas.core.frame.DataFrame]
Scrapes data from the URLs asynchronously.
:param progress_callback: Callback function to update the progress bar.
:return: Dictionary of data frames with scraped data.
update_url_list(self, config: dict) -> None
Updates the URL list with the given configuration.
:param config: Configuration dictionary.
:return:

Data descriptors defined here:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)

 
Data
        Callable = typing.Callable
Union = typing.Union