[ Distinguishing between listings and related items in eBay search results usine WebDriver? ]
For a typical eBay search results page such as this, I'm using WebDriver to extract the price of each result thus:
PRICEELEMENT = 'ul:nth-child(3) > li:nth-child(1) > span:nth-child(1)'
prices = [float(price.text.replace('USD','')) for price in driver.find_elements_by_css_selector(PRICEELEMENT)]
This is working well. It grabs the prices of both the actual listings, and the "more items related to".
Now what I want to do, in cases like in the above link where there are only 3 results and the rest are "related", is extract only the prices of the actual listings. Specifically when there are between 1 to 5 (inclusive) actual listings, only extract these.
I'm not seeing how the page, apart from the text "More items related to Mizuno Pants Belted Padded", are distinguishing the search results. All have the same CSS selector (ul:nth-child(3) > li:nth-child(1) > span:nth-child(1)
) and class name (bold bidsold
), regardless of whether they are actual listings, or related items.
If I have to, I can first fetch the number of listings X, then only consider the first X prices in prices
. But is there a way to use the page structure itself to achieve this?
Answer 1
Make it simple - how would you, as a human, define which of the listings are actual search results and which are "related"? - I guess by that "More items related to ... " label in the middle. Let's use that with the help of preceding-sibling
and following-sibling
notations:
search_results = driver.find_elements_by_xpath("//li[.//*[contains(., 'More items related to')]]/preceding-sibling::li[@listingid]")
related_results = driver.find_elements_by_xpath("//li[.//*[contains(., 'More items related to')]]/following-sibling::li[@listingid]")