Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Contribute to GitLab
Sign in / Register
Toggle navigation
E
esi-table-data
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
esi-data-scrapping
esi-table-data
Commits
7f6e7b0e
Commit
7f6e7b0e
authored
Jul 26, 2017
by
Vasyl Bodnaruk
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Add spider for Crunchbase
parent
4bfd5fad
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
24 additions
and
0 deletions
+24
-0
cb.py
exa/exa/spiders/cb.py
+24
-0
No files found.
exa/exa/spiders/cb.py
0 → 100644
View file @
7f6e7b0e
# -*- coding: utf-8 -*-
import
scrapy
from
.base
import
BaseSpider
from
..items
import
ExaItem
class
CbSpider
(
BaseSpider
):
name
=
"cb"
allowed_domains
=
[
"www.crunchbase.com"
]
start_urls
=
[
'http://www.crunchbase.com/organization/sense-ly/press/'
]
def
parse
(
self
,
response
):
rows
=
response
.
xpath
(
"//table/tr"
)[
1
:]
print
(
rows
)
for
i
in
rows
:
item
=
ExaItem
()
item
[
'date'
]
=
i
.
xpath
(
"//td[contains(@class, 'date')]/text()"
)
.
extract_first
()
item
[
'title'
]
=
i
.
xpath
(
"//td/a/text()"
)
.
extract_first
()
item
[
'url'
]
=
i
.
xpath
(
"//td/a/@href"
)
.
extract_first
()
print
(
item
)
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment