| |
- builtins.object
-
- Scraper
class Scraper(builtins.object) |
|
Scraper(url_strings: list[src.Scraping.URLBuilder.URLBuilder], page_limit: int) -> None
|
|
Methods defined here:
- __init__(self, url_strings: list[src.Scraping.URLBuilder.URLBuilder], page_limit: int) -> None
- Scraper class for scraping data from OLX.
:param url_strings: List of URLBuilder objects for scraping data.
:param page_limit: Limit of pages to scrape for each URL.
- add_url(self, url: src.Scraping.URLBuilder.URLBuilder) -> None
- Adds a URLBuilder object to the list of URLs to scrape.
:param url: URLBuilder object to add.
:return:
- find_count(self, soup: bs4.BeautifulSoup) -> int
- Finds the number of listings on the page.
:param soup: Soup object to search for the count.
:return: Number of listings on the page.
- load_scraping_history(self) -> list[dict[str, typing.Union[str, datetime.datetime]]]
- Loads the scraping history from the scraping history file.
:return: List of scraping history entries.
- save_scrape_date(self) -> None
- Saves the date of the last scrape to the scraping history file.
:return:
- async scrape_data(self, progress_callback: Callable[[int], NoneType] = None) -> dict[str, pandas.core.frame.DataFrame]
- Scrapes data from the URLs asynchronously.
:param progress_callback: Callback function to update the progress bar.
:return: Dictionary of data frames with scraped data.
- update_url_list(self, config: dict) -> None
- Updates the URL list with the given configuration.
:param config: Configuration dictionary.
:return:
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
| |