[ Scrapy: traversing a document ]

This a mock-up of part of a document I'm working with. What I'm trying to do is first find the Time and Cost elements, and from there find their respective values. I've tried various axis selectors but haven't got anywhere. I don't to go directly to the Time and Cost elements, I need to find them in relation to their associated h4s.

<ul class="events">
  <li id="event-123456" class=eventItem>
    <div class="details">                
      <h4>Time</h4>
      <div>
        <p>17:00</p>
      </div>
      <h4>Cost</h4>
      <div>
      <p>10.00</p>
      </div>
    </div>
  </li>
  <li id="event-678901" class=eventItem>
    <div class="details">                
      <h4>Time</h4>
      <div>
        <p>21:00</p>
      </div>
      <h4>Cost</h4>
      <div>
      <p>20.00</p>
      </div>
    </div>
  </li>
</ul>

This is the skeleton of the parser

def parse(self, response):
        Events = response.xpath('//ul')
        for event in Events:
            item['cost'] = event.xpath(???)
            item['time'] = event.xpath(???)

Answer 1


following-sibling would help here:

events = response.xpath('//ul[@class = "events"]/li')
for event in events:
    item = MyItem()

    item['cost'] = event.xpath(".//h4[. = 'Cost']/following-sibling::div/p/text()").extract_first()
    item['time'] = event.xpath(".//h4[. = 'Time']/following-sibling::div/p/text()").extract_first()

    yield item