Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Contribute to GitLab
Sign in / Register
Toggle navigation
E
esi-table-data
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
esi-data-scrapping
esi-table-data
Commits
8bdee0a5
Commit
8bdee0a5
authored
Jul 26, 2017
by
Vasyl Bodnaruk
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Add get_media function for select media
If media doesn't exist in DB - need to create media and return id
parent
0019f323
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
12 additions
and
1 deletion
+12
-1
cb.py
exa/exa/spiders/cb.py
+12
-1
No files found.
exa/exa/spiders/cb.py
View file @
8bdee0a5
...
@@ -14,11 +14,22 @@ class CbSpider(BaseSpider):
...
@@ -14,11 +14,22 @@ class CbSpider(BaseSpider):
def
parse
(
self
,
response
):
def
parse
(
self
,
response
):
rows
=
response
.
xpath
(
"//table/tr"
)[
1
:]
rows
=
response
.
xpath
(
"//table/tr"
)[
1
:]
print
(
rows
)
for
i
in
rows
:
for
i
in
rows
:
item
=
ExaItem
()
item
=
ExaItem
()
item
[
'date'
]
=
i
.
xpath
(
"./td[contains(@class, 'date')]/text()"
)
.
extract_first
()
item
[
'date'
]
=
i
.
xpath
(
"./td[contains(@class, 'date')]/text()"
)
.
extract_first
()
item
[
'title'
]
=
i
.
xpath
(
"./td/a/text()"
)
.
extract_first
()
item
[
'title'
]
=
i
.
xpath
(
"./td/a/text()"
)
.
extract_first
()
item
[
'url'
]
=
i
.
xpath
(
"./td/a/@href"
)
.
extract_first
()
item
[
'url'
]
=
i
.
xpath
(
"./td/a/@href"
)
.
extract_first
()
item
[
'media_id'
]
=
self
.
_get_media
(
i
)
print
(
item
)
print
(
item
)
def
_get_media
(
self
,
elem
):
media_name
=
elem
.
xpath
(
"./td[contains(@class, 'article')]/span/text()"
)
.
extract_first
()
media_url
=
elem
.
xpath
(
"./td/a/@data_publisher"
)
.
extract_first
()
query
=
"select * from wp_esi_media where name like '
%
{}
%
' or url like '
%
{}
%
'"
.
format
(
media_name
,
media_url
)
media
=
self
.
pipeline
.
db
.
select
(
query
)
if
len
(
media
)
==
0
:
media
=
self
.
pipeline
.
db
.
insert
(
"INSERT INTO wp_esi_media (name, url) VALUES(
%
s,
%
s)"
,
(
media_name
,
media_url
))
else
:
media
=
media
[
0
][
0
]
return
media
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment