spider help

Help each other out

spider help

Postby jpoveda on Wed Sep 02, 2009 4:48 pm

Hello,

I'm trying to get movie info from http://www.filmaffinity.com. I made a spider called filmaffinity.com.txt with this content:

limit=0
url=http://www.filmaffinity.com/es/search.php?stext=%searchstring%&stype=all
results=<b><a href="(?<url>/es/.*?)".*?>(?<display>.*?)</a>

to try it I searched for "terminator" movie but I can't see any of the five results I expected.

The code to look into with regex expression is:

<a class="addl" href="/es/edtmovielists.php?movie_id=517417&rp=%2Fes%2Fsearch.php%3Fstext%3Dterminator%26stype%3Dall">AƱadir a listas</a> <b><a href="/es/film517417.html">Terminator Salvation</a></b> (2009) <img src="/imgs/countries/US.jpg" title="Estados Unidos" border="0" align="middle"></td></tr>

May the problem be the url is relative /es/film517417.html instead of absolute http://www.filmaffinity.com/es/film517417.html ?

Can anybody help me?

Thanks in advance.
jpoveda
 
Posts: 111
Joined: Mon May 17, 2004 6:45 am
Location: Spain

Re: spider help

Postby slaman on Thu Sep 03, 2009 5:57 am

The relative is definitely the problem... I think you may need to use the replace= function, but unfortunately, I've tried to use it to no avail... (see my question in the IMDB spider thread)
slaman
 
Posts: 145
Joined: Sat Oct 14, 2006 10:30 pm

Re: spider help

Postby jpoveda on Thu Sep 03, 2009 7:44 am

Hello slaman,

Fortunally it doesnt. xlobby recognize it. I had to restart xlobby to get spider changes. More or less it's working now.

limit=0
url=http://www.filmaffinity.com/es/advsearch.php?stext=%searchstring%&stype[]=title&genre=&country=&fromyear=&toyear=
results=<b><a href="(?<url>/es/.*?)">(?<display>.*?)</a>

//title and year
\\<title>(?<Title>.*?) \((?<Genre>.*?)\) \((?<Year>.*?)\) - FilmAffinity</title>
<title>(?<Title>.*?) \((?<Year>.*?)\) - FilmAffinity</title>

//genre
//<td valign="top">(?<Genre>.*?) SINOPSIS:

//plot
//(?<genre>.*?)
SINOPSIS: (?<Plot>.*?) \(FILMAFFINITY\)

//coverart
//href="(?<coverart>http://pics.filmaffinity.com/.*?jpg)"
<img src="(?<coverart>http://pics.filmaffinity.com/.*?jpg)"

//director
<a href="search.php\?stype=director&stext=.*?">(?<Director>.*?)</a>

//actors
<a href="search.php\?stype=cast&stext=.*?">(?<Actors>.*?)</a>

//runtime
//<td>(?<runtime>.*?) min.</td>

//rating
<tr><td align="center" style="color:#990000; font-size:22px; font-weight: bold;">(?<Rating>.*?)</td></tr>

I will read your post and send final code...
jpoveda
 
Posts: 111
Joined: Mon May 17, 2004 6:45 am
Location: Spain

Re: spider help

Postby jpoveda on Fri Sep 04, 2009 10:26 am

Hello,

This is my filmaffinity spider. You can modify to support other languages easly.

Regards.

limit=0
url=http://www.filmaffinity.com/es/advsearch.php?stext=%searchstring%&stype[]=title&genre=&country=&fromyear=&toyear=
results=<b><a href="(?<url>/es/.*?)">(?<display>.*?)</a>

//title and year
<title>(?<Title>.*?) \(.*?\) - FilmAffinity</title>

//year
<td valign="baseline" align="right" ><b>A.O<.*?<td >(?<Year>.*?)</td>

//plot
SINOPSIS: (?<Plot>.*?) \(FILMAFFINITY\)

//coverart
<a class="lightbox" href="(?<coverart>http://pics.filmaffinity.com/.*?jpg)"
<img src="(?<coverart>http://pics.filmaffinity.com/.*?jpg)"

//director
<a href="search.php\?stype=director&stext=.*?">(?<Director>.*?)</a>

//actors
<a href="search.php\?stype=cast&stext=.*?">(?<Actors>.*?)</a>

//runtime
<b>DURACI.N</b>.*?<td>.*?<td>(?<Runtime>.*?)</td>

//rating
<tr><td align="center" style="color:#990000; font-size:22px; font-weight: bold;">(?<Rating>.*?)</td></tr>
jpoveda
 
Posts: 111
Joined: Mon May 17, 2004 6:45 am
Location: Spain

Re: spider help

Postby slaman on Sat Sep 12, 2009 2:44 am

Wow that was a great spider! Thanks! I added genre to the list and have replaced all my existing movie spiders - good find with the site!
slaman
 
Posts: 145
Joined: Sat Oct 14, 2006 10:30 pm

Re: spider help

Postby jpoveda on Mon Sep 14, 2009 6:19 am

Hello,

Take note spider try to get best cover.

//coverart
<a class="lightbox" href="(?<coverart>http://pics.filmaffinity.com/.*?jpg)"
<img src="(?<coverart>http://pics.filmaffinity.com/.*?jpg)"

so you will get a low or a high cover resolution.

How do you get genre?. I had tons of troubles with "()" in title section
jpoveda
 
Posts: 111
Joined: Mon May 17, 2004 6:45 am
Location: Spain

Re: spider help

Postby slaman on Tue Sep 15, 2009 2:22 pm

//genre
<td valign="baseline" align="right" ><b>GENRE.*?<td>(?<genre>.*?)</td>

That's all I used for genre...

but I think I'm going to keep my IMDB.com spider for movie info and use the one you posted for movie covers... If I have something as "posters/test.jpg", how can I store it in a variable by adding "http://www.xyz.com/" before it? I'm trying to get the IMPAwards.com spider working again.
slaman
 
Posts: 145
Joined: Sat Oct 14, 2006 10:30 pm