Commit 726f3aca authored by Я's avatar Я

init this one

parents
.Rproj.user
.Rhistory
.RData
.Ruserdata
This diff is collapsed.
File added
File added
This diff is collapsed.
# 2. Select graphs
# 2.1. Countries with more than 100 companies
gg_q1_1 <- df_q1() %>%
filter(sum > 100) %>%
ggplot(aes(reorder(country, sum), sum, fill=factor(sum))) +
ggtitle("Countries with more than 100 companies") +
ylab("Number of companies") +
xlab("Countries") +
geom_bar(stat="identity") +
scale_fill_discrete(name = "Number") +
coord_flip()
# 2.2. Countries with 10-100 companies
gg_q1_2 <- df_q1() %>%
filter(sum %in% 10:100) %>%
ggplot(aes(reorder(country, sum), sum, fill=factor(sum))) +
ggtitle("Countries with 10-100 companies") +
xlab("Countries") +
ylab("Number of companies") +
geom_bar(stat="identity") +
scale_fill_discrete(name = "Number") +
theme_bw() +
coord_flip()
# 2.3. Countries with 1-10 companies
gg_q1_3 <- df_q1() %>%
filter(sum %in% 1:10) %>%
ggplot(aes(reorder(country, sum), sum, fill=factor(sum))) +
ggtitle("Countries with 1-10 companies") +
xlab("Countries") +
ylab("Number of companies") +
geom_bar(stat="identity") +
scale_fill_discrete(name = "Number") +
coord_flip()
# Select graph
if(input$q1_choice == "> 100"){ g <- gg_q1_1 }
if(input$q1_choice == "10-100"){ g <- gg_q1_2 }
if(input$q1_choice == "1-10"){ g <- gg_q1_3 }
This diff is collapsed.
Version: 1.0
RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default
EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8
RnwWeave: knitr
LaTeX: pdfLaTeX
#
# Downloading Data
# https://shiny.rstudio.com/gallery/file-download.html
#
# Q: Shiny app: downloadHandler does not produce a file
# A: Note the download button does not work in the RStudio viewer.
# Your friend might be using the RStudio viewer to view the app.
# If that is the case, please open the app in the external web browser
# (there is a drop-down list on the right of the "Run App" button:
# Run in Window, Run in Viewer Pane, Run External; choose the last one).
#
# UI ----------------------------------------------------------------------
ui <- navbarPage("",
fluidPage(
titlePanel('Downloading Data'),
sidebarLayout(
sidebarPanel(
selectInput("dataset", "Choose a dataset:",
choices = c("rock", "pressure", "cars")),
downloadButton('downloadData', 'Download')
),
mainPanel(
tableOutput('table')
)
)
)
)
# Server ------------------------------------------------------------------
server <- function(input, output, session) {
datasetInput <- reactive({
switch(input$dataset,
"rock" = rock,
"pressure" = pressure,
"cars" = cars)
})
output$table <- renderTable({
datasetInput()
})
output$downloadData <- downloadHandler(
filename = function() {
paste(input$dataset, '.csv', sep='')
},
content = function(file) {
write.csv(datasetInput(), file)
}
)
}
shinyApp(ui = ui, server = server)
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
New Tags
--------
HealthCare
HealthCare Insurance
HealthCare Diagnostics
HealthCare IT Solutions
HealthCare
-----------
Health Care
Healthcare
Health Care Plans
Health Maintenance
Home Health Care
Home Health Care
Personal Health
Specialized Health Services
HealthCare Insurance
--------------------
Health Insurance
Accident & Health Insurance
HealthCare Diagnostics
----------------------
Health Diagnostics
HealthCare IT Solutions
-----------------------
Healthcare & Technologies
Healthcare Information Services
mHealth
"id","name","parent"
1,"Business development",0
2,"Channel development / management",1
3,"Customer acquisition",1
4,"New service development",1
5,"Partnership development",1
6,"Support functions",0
7,"Accounting department",6
8,"Internal audit",6
9,"Management control",6
10,"Billing",6
11,"Payment collection",6
12,"Human resources",6
13,"Information Services",6
14,"Procurement department",6
15,"Sales & marketing",0
16,"Product marketing",15
17,"Promotion",15
18,"Sales",15
19,"Advertising",15
20,"Channel management",15
21,"Customer relationship management",15
22,"Research and Development",0
23,"Production & Quality",0
24,"Product manufacturing",23
25,"Distribution",23
26,"Service delivery",23
27,"Quality Assurance",23
28,"Aftermarket",0
29,"Customer service",28
30,"Maintenance",28
31,"Recycling",28
"id","name","parent"
1,"Basic Materials",0
2,"Agricultural Chemicals",1
3,"Agricultural raw products (grain, oil, juice, etc)",1
4,"Aluminum",1
5,"Chemicals - Major Diversified",1
6,"Copper",1
7,"Gold",1
8,"Independent Oil & Gas",1
9,"Industrial Metals & Minerals",1
10,"Major Integrated Oil & Gas",1
11,"Nonmetallic Mineral Mining",1
12,"Oil & Gas Drilling & Exploration",1
13,"Oil & Gas Equipment & Services",1
14,"Oil & Gas Pipelines",1
15,"Oil & Gas Refining & Marketing",1
16,"Silver",1
17,"Specialty Chemicals",1
18,"Steel & Iron",1
19,"Synthetics",1
20,"Consumer Goods",0
21,"Appliances",20
22,"Automotive",20
23,"Beverages - Brewers",20
24,"Beverages - Soft Drinks",20
25,"Beverages - Wineries & Distillers",20
26,"Business Equipment",20
27,"Cigarettes",20
28,"Cleaning Products",20
29,"Confectioners",20
30,"Dairy Products",20
31,"Electronic Equipment",20
32,"Farm Products",20
33,"Food - Major Diversified",20
34,"Home Furnishings & Fixtures",20
35,"Housewares & Accessories",20
36,"Meat Products",20
37,"Office Supplies",20
38,"Packaging & Containers",20
39,"Paper & Paper Products",20
40,"Personal Products",20
41,"Photographic Equipment & Supplies",20
42,"Processed & Packaged Goods",20
43,"Recreational Goods, Other",20
44,"Recreational Vehicles",20
45,"Rubber & Plastics",20
46,"Sporting Goods",20
47,"Textile - Apparel Clothing",20
48,"Textile - Apparel Footwear & Accessories",20
49,"Tobacco Products, Other",20
50,"Toys & Games",20
51,"Trucks & Other Vehicles",20
52,"Finance",0
53,"Accident & Health Insurance",52
54,"Asset Management",52
55,"Closed-End Fund - Debt",52
56,"Closed-End Fund - Equity",52
57,"Closed-End Fund - Foreign",52
58,"Credit Services",52
59,"Diversified Investments",52
60,"Foreign Money Center Banks",52
61,"Foreign Regional Banks",52
62,"Insurance Brokers",52
63,"Investment Brokerage - National",52
64,"Investment Brokerage - Regional",52
65,"Life Insurance",52
66,"Money Center Banks",52
67,"Mortgage Investment",52
68,"Property & Casualty Insurance",52
69,"Property Management",52
70,"Real Estate Fund",52
71,"Real Estate Development",52
72,"Regional Banks",52
73,"Savings & Loans",52
74,"Safety & Personal Insurance",52
75,"Healthcare",0
76,"Biotechnology",75
77,"Diagnostic Substances & Systems",75
78,"Drug Delivery",75
79,"Drug Manufacturers - Major",75
80,"Drug Manufacturers - Other",75
81,"Drug Related Products",75
82,"Drugs - Generic",75
83,"Health Care Plans",75
84,"Home Health Care",75
85,"Hospitals",75
86,"Long-Term Care Facilities",75
87,"Medical Appliances & Equipment",75
88,"Medical Instruments & Supplies",75
89,"Medical Laboratories & Research",75
90,"Medical Practitioners",75
91,"Specialized Health Services",75
92,"Industrial Goods",0
93,"Aerospace/Defense - Major Diversified",92
94,"Aerospace/Defense Products & Services",92
95,"Cement",92
96,"Diversified Machinery",92
97,"Farm & Construction Machinery",92
98,"General Building Materials",92
99,"General Contractors",92
100,"Heavy Construction",92
101,"Industrial Electrical Equipment",92
102,"Industrial Equipment & Components",92
103,"Lumber, Wood Production",92
104,"Machine Tools & Accessories",92
105,"Manufactured Housing",92
106,"Metal Fabrication",92
107,"Pollution & Treatment Controls",92
108,"Residential Construction",92
109,"Small Tools & Accessories",92
110,"Textile Industrial",92
111,"Waste Management",92
112,"Distribution",0
113,"Apparel Retail",112
114,"Auto Dealerships",112
115,"Auto Parts Retail",112
116,"Auto Parts Wholesale",112
117,"Basic Materials Wholesale",112
118,"Building Materials Wholesale",112
119,"Catalog & Mail Order Houses",112
120,"Computers Wholesale",112
121,"Department Retail",112
122,"Discount, Variety Retail",112
123,"Drug Retail",112
124,"Drugs Wholesale",112
125,"Ecommerce, general",112
126,"Electronics Wholesale",112
127,"Electronics Retail",112
128,"Food Wholesale",112
129,"Grocery Retail",112
130,"Home Furnishing Retail",112
131,"Home Improvement Retail",112
132,"Industrial Equipment Wholesale",112
133,"Jewelry retail",112
134,"Medical Equipment Wholesale",112
135,"Music & Video Retail",112
136,"Specialty Retail, Other",112
137,"Sporting Goods Retail",112
138,"Toy & Hobby Retail",112
139,"Wine Retail",112
140,"Wholesale, Other",112
141,"Media & entertainment",0
142,"Advertising Agencies",141
143,"Broadcasting - Radio",141
144,"Broadcasting - TV",141
145,"Entertainment - Diversified",141
146,"Gaming Activities",141
147,"General Entertainment",141
148,"Movie Production, Theaters",141
149,"Publishing - Books",141
150,"Publishing - Newspapers",141
151,"Publishing - Periodicals",141
152,"Resorts & Casinos",141
153,"Services",0
154,"Aftermarket Services",153
155,"Business Services",153
156,"Consumer Services",153
157,"Education & Training Services",153
158,"Lodging",153
159,"Management Services",153
160,"Marketing Services",153
161,"Personal Services",153
162,"Rental & Leasing Services",153
163,"Research Services",153
164,"Restaurants",153
165,"Security & Protection Services",153
166,"Specialty Eateries",153
167,"Sporting Activities",153
168,"Staffing & Outsourcing Services",153
169,"Technical Services",153
170,"Transport & logistics",0
171,"Airlines",170
172,"Air Delivery & Freight Services",170
173,"Air Services, Other",170
174,"Bus transportation Services",170
175,"B2B logistics",170
176,"B2C logistics",170
177,"Freight forwarding",170
178,"Last mile delivery",170
179,"Railway / Train transportation",170
180,"Shipping",170
181,"Trucking",170
182,"Warehousing",170
183,"Technology",0
184,"Application Software",183
185,"Business Software & Services",183
186,"Communication Equipment",183
187,"Computer Based Systems",183
188,"Healthcare Information Services",183
189,"Information & Delivery Services",183
190,"Information Technology Services",183
191,"Internet players",183
192,"Internet Service Providers",183
193,"Multimedia & Graphics Software",183
194,"Networking & Communication Devices",183
195,"Personal Computers",183
196,"Printed Circuit Boards",183
197,"Processing Systems & Products",183
198,"Scientific & Technical Instruments",183
199,"Security Software & Services",183
200,"Semiconductor",183
201,"Technical & System Software",183
202,"Telecom Services",183
203,"Telecom Equipment",183
204,"Utilities",0
205,"Diversified Utilities",204
206,"Power Utilities",204
207,"Gas Utilities",204
208,"Water Utilities",204
<basic tables>
wp_esi_tag
wp_esi_technology
wp_esi_industry
wp_esi_function
<entity cross-tables>
wp_esi_tag_entity << wp_esi_entity_tag << better rename
wp_esi_technology_entity << wp_esi_entity_technology << better rename
wp_esi_entity_industry == wp_esi_entity_industry
? == wp_esi_entity_function
<news cross-tables>
wp_esi_tag_news << wp_esi_news_tag
wp_esi_technology_news << wp_esi_news_technology
wp_esi_news_industry == wp_esi_news_industry
? == wp_esi_news_function
<radar cross-tables>
wp_esi_radar_industry == wp_esi_radar_industry
...
-------------------------------
nrow(df_technology) = 38
nrow(df_industry) = 208
nrow(df_functon) = 31
nrow(df_entity_technology) = 8104
nrow(df_entity_industry) = 12355
#
# select companies by tags
#
library(RMySQL)
#library(sqldf)
library(feather)
library(dplyr)
library(reshape2)
library(lubridate)
# 1. Connect to db
mydb <- dbConnect(MySQL(), user='analyst', password='exa_analyst1&',
dbname='esi_management',
host='lecanaldb.c12hbxfn3xzn.eu-west-1.rds.amazonaws.com',
port=3306)
# 2. List of tables
#dbListTables(mydb)
# 3. Load tables
# 3.1. Load one "table" function
db_load <- function(table){
rs = dbSendQuery(mydb, paste0("select * from ", table))
df = fetch(rs, n=-1)
write_feather(df, paste0("data/", table))
}
# 3.2. Load all tables
df_entity <- db_load("wp_esi_entity")
df_tag <- db_load("wp_esi_tag")
df_tag_entity <- db_load("wp_esi_tag_entity")
# 3.3. Feature Engineering
df_entity <- df_entity %>%
subset(select = c(id, name, country, city))
# 4. Save as a chached file
#write_feather(df, paste0("data/", table))
# ------------------------------------------------------------------------
# entity_id: 474
# tag_id: 56, 77
f <- read_feather("entities_tags_selected")
unique_tags <- unique(f$tag_id)
f_and <- f %>%
group_by(entity_id) %>%
summarise(n = n()) %>%
filter(n == length(unique_tags))
# # OR
# f_or <- f %>%
# filter(entity_id %in% f$entity_id)
# f_and <- df_entity %>%
# filter(id %in% f$entity_id)
#
# select companies by tech
#
library(shinydashboard)
library(RMySQL)
#library(sqldf)
library(feather)
library(ggplot2)
library(dplyr)
library(reshape2)
library(lubridate)
library(plotly)
# 1. Connect to db
mydb <- dbConnect(MySQL(), user='analyst', password='exa_analyst1&',
dbname='esi_management',
host='lecanaldb.c12hbxfn3xzn.eu-west-1.rds.amazonaws.com',
port=3306)
# 2. List of tables
dbListTables(mydb)
# 3. Load tables
db_load <- function(table){
rs = dbSendQuery(mydb, paste0("select * from ", table))
df = fetch(rs, n=-1)
write_feather(df, paste0("data/", table))
}
# 3.2. Load all tables
# <main>
df_entity <- db_load("wp_esi_entity")
# basic for cross
df_tag <- db_load("wp_esi_tag")
df_technology <- db_load("wp_esi_technology")
df_industry <- db_load("wp_esi_industry")
df_function <- db_load("wp_esi_function")
# cross-tables
df_entity_tag <- db_load("wp_esi_tag_entity")
df_entity_technology <- db_load("wp_esi_technology_entity")
df_entity_industry <- db_load("wp_esi_entity_industry")
# stats
nrow(df_technology)
write.csv(df_technology, "technology.csv", row.names = F)
nrow(df_industry)
write.csv(df_industry, "industry.csv", row.names = F)
nrow(df_entity_technology)
write.csv(df_entity_technology, "entity_technology.csv", row.names = F)
nrow(df_functon)
write.csv(df_functon, "functon.csv", row.names = F)
nrow(df_entity_industry)
write.csv(df_entity_industry, "entity_industry.csv", row.names = F)
# Merge ------------------------------------------------------------------
# Add "source" feature
df_tag$source <- "tag"
df_technology$source <- "technology"
df_industry$description <- NA
df_industry$source <- "industry"
df_function$description <- NA
df_function$source <- "function"
# Merge all
df <- df_tag %>%
rbind(df_technology) %>%
rbind(df_industry) %>%
rbind(df_function) %>%
arrange(name)
write.csv(df, "tag_tech_ind_func.csv", row.names = F)
# # 4. Save as a chached file
# write_feather(df, paste0("data/", table))
#
#
#
df_investment <- read_feather("data/wp_esi_investment")
colnames(df_investment)
# "entity_id", "investment_date", "amount", "currency"
df_entity <- read_feather("data/wp_esi_entity")
#left_join(test_data, kantrowitz, by = c("first_name" = "name"))
f <-
left_join(df_entity, df_investment, by = c("id" = "entity_id")) %>%
subset(select = c("id", "name", "description", "country", "city",
"founded_in", "employees","total_raised",
"amount", "currency", "investment_date")) %>%
mutate(
country = ifelse(country=="", "N/A", country),
description = paste0(substr(description, 1, 20), "..."),
investment_date = as.Date(investment_date)
)
This diff is collapsed.
Version: 1.0
RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default
EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8
RnwWeave: knitr
LaTeX: pdfLaTeX
This diff is collapsed.
"id","name","description","parent"
1,"Computer science","",0
2,"Artificial Intelligence","",0
3,"Computational linguistics","",0
4,"Cloud computing","",0
5,"Cybernetics","",0
6,"Modal logic","",0
7,"Computer Vision","",0
8,"Data Science","",0
9,"Deep learning","",2
10,"Autonomous motion","",2
11,"Marine engineering","",0
12,"General Engineering","",0
13,"Acoustical engineering","",0
14,"Automotive engineering","",0
15,"Chemical engineering","",0
16,"Control engineering","",0
17,"Electrical engineering","",0
18,"Electronic engineering","",0
19,"Mechanical engineering","",0
20,"Mechatronics engineering","",0
21,"Microelectromechanical engineering","",0
22,"Nanoengineering","",0
23,"Optical engineering","",0
24,"Safety engineering","",0
25,"Software engineering","",0
26,"Telecommunications","",0
27,"Hydraulics","",0
28,"Pneumatics","",0
29,"Machine Learning","",2
30,"Natural Language Processing","",2
31,"Data Mining","",8
32,"Big data","",8
33,"CyberSecurity","",0
34,"Analytics","",0
35,"Predictive Analytics","",34
36,"FinTech","",0
37,"Blockchain","",36
38," Speech Recognition","",30
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment