WhiteHatBox
Share Page
Following(0)
Partner(s)
PercyTapia

Find if text on page then scrape value near to it. No unique selectors available

ReplyThanks 2018/03/04 20:24:42 0 0

Hi,

could you please give me any suggestions for solving this problem:

This is a details page about rental properties.

I need to get the text value next to "Parking". In this case "No Garage". I can select and find the control ok on one property page but on other pages the control is in a different place (1 row above or below) and it scrapes something else like "Utilities"-value.

I can't find any unique classes, IDs, titles or anything for "Parking" and so cannot get the correct value on every page.

Is it possible to find if "Parking" is on the page then scrape the value next to it?

Any help would be greatly appreciated.

I cannot give you the URL because it is a premium access from my client but the section of code is as follows.

<td class="d124m41" colspan="6"><span class="heading2 d124m42">Features </span></td>
<td class="d124m3"></td></tr>
<tr>
<td class="d124m3"></td>
<td class="d124m4">
<div class="d124m10"></div></td>
<td class="d124m5"></td>
<td class="d124m6"></td>
<td class="d124m7"></td>
<td class="d124m8"></td>
<td class="d124m9"></td>
<td class="d124m3"></td></tr>




<tr class="d124m21">
<td class="d124m3"></td>
<td class="d124m4"><span class="label">Interior: </span></td>
<td class="d124m43" colspan="5"><span class="formula field d124m23"> No/Unknown Accessibility Modifications, Main Floor Laundry</span></td>
<td class="d124m3"></td></tr>
<tr>
<td class="d124m3"></td>
<td class="d124m27" colspan="6"><span class="formula field d124m32"><hr></span></td>
<td class="d124m3"></td></tr>
<tr>
<td class="d124m3"></td>
<td class="d124m45"><span class="label">Kitchen: </span></td>
<td class="d124m43" colspan="5"><span class="formula field d124m23">Eat-In Kitchen</span></td>
<td class="d124m3"></td></tr>
<tr>
<td class="d124m3"></td>
<td class="d124m27" colspan="6"><span class="formula field d124m32"><hr></span></td>
<td class="d124m3"></td></tr>
<tr>
<td class="d124m3"></td>
<td class="d124m4"><span class="label d124m23">Basement: </span></td>
<td class="d124m43" colspan="5"><span class="formula field"> No Basement</span></td>
<td class="d124m3"></td></tr>
<tr>
<td class="d124m3"></td>
<td class="d124m27" colspan="6"><span class="formula field d124m32"><hr></span></td>
<td class="d124m3"></td></tr>


<tr class="d124m21"><td class="d124m3"></td>
<td class="d124m4"><span class="label d124m23">Parking: </span></td>
<td class="d124m43" colspan="5"><span class="formula field d124m23">No Garage</span></td>
<td class="d124m3"></td></tr>
<tr>
<td class="d124m3"></td>
<td class="d124m27" colspan="6"><span class="formula field d124m32"><hr></span></td>
<td class="d124m3"></td></tr>
<tr class="d124m21">
<td class="d124m3"></td>
<td class="d124m4"><span class="label d124m23">Utilities: </span></td>
<td class="d124m43" colspan="5">&nbsp;<span class="formula field d124m23">Central Air</span></td>
<td class="d124m3"></td></tr>
<tr>
<td class="d124m3"></td>
<td class="d124m27" colspan="6"><span class="formula field d124m32"><hr></span></td>
<td class="d124m3"></td></tr>


Thanks :-)

Percy


Aprilcaicai
2018/03/05 11:10:00

Use this option in Scrape command to scrape the entire page content or the content in the Feature form. Then use If command to judge if there content contains word "Parking".

PercyTapia
2018/03/05 13:54:21
OK thanks for the quick reply. I could find if the word parking is available using your suggestion but is there any way to scrape the value corresponding to it e.g. "no garage" zuwithout having to use regex? That is the text that I need to scrape. I wouldn't know where to start getting the value using regex.Thanks for your help. :-)
Aprilcaicai
2018/03/05 14:30:05
Try to scrape all content in Features table. Please scrape this row (tr) content, then use if command to judge which row contains the word "Parking", after find this row, then deal with this row's content to get the text near Parking.
PercyTapia
2018/03/17 00:14:17
Hi,
I have been learning regex because I think that is the only way to do it.
But the problem I have now, and would like to have solved for future regex solutions, is that the "regex process" only returns the full match and not the group.
I scrape the whole page and use regex.
Please see the regex code here on regex101.com (https://regex101.com/r/wncKVe/1). It uses part of the page code. to test.

Parking.*\s.*formula\sfield\s.{9}(.*)<\/span><\/td>

On this editor and others I can successfully isolate the parking value in a group match but I can't get BotChief to save the group match to a variable. It only saves the full match and nothing else.
I am using a table so it should save all matches to different lines but it doesn't.
I also tried "variable operate" with the option "Regular expression processing" and the same thing: It only ever returns the full match but not the contents of the group.

I am not that experienced at regex but should BotChief not also return the value inside groups after the full match? One match for every row in the table until all matches are in the table?

I'm Getting closer to solving this and other similar problems.

Thanks :-)

Aprilcaicai
2018/03/21 16:36:50
So sorry for the late reply.

We haven't provided a function to get the group. But we will consider to add it in the future. Please wait for the news.

So sorry for the inconvenience.
PercyTapia
2018/03/24 18:11:07

Yes please add this because I would think it is essential if you want to use regex properly to be able to save group matches. This would be a huge short cut to some other problems I have had if I could just scrape the whole page and use regex on it.

Looking forward to seeing this being implemented as soon as you have time .

Aprilcaicai
2018/03/26 10:40:22

Yes, very thanks for the perfect suggestions.


We already added it to our work list. If our programmers can add it in the future, they will release the update ASAP. Please wait for the news.

<< < 1> >>
VerifyCode
Advanced Option