scrape


X Ray Scraper: Manipulate data before .write


I'm fiddling around with some scraping, and need to manipulate some of the data before writing it to my json file.
var Xray = require('x-ray');
var x = Xray();
x('http://myUrl.com', '#search_results div div a', [{
title: '.responsive_search_name_combined .search_name .title',
price: '.col.search_price.responsive_secondrow',
}])
.paginate('.search_pagination_right a.pagebtn:last-child#href')
.limit(10)
.write('data.json');
When saved, price looks like this: "price": "\r\n\t\t\t\t\t\t\t\t13,99€\t\t\t\t\t\t\t".
I guess its because theres a lot of spaces in div.col.search_price.responsive_secondrow.
<div class="col search_price responsive_secondrow">
9,99€ </div>
So my question is: Would it be possible to manipulate the data before .write?
Yes, you can simply provide a callback function that takes an object which is the result of your scrape. In this function you can take full control of any post-processing you want to do.
So your code would end up something like:
x('http://myUrl.com', '#search_results div div a', [{
title: '.responsive_search_name_combined .search_name .title',
price: '.col.search_price.responsive_secondrow',
}])
(function(products){
var cleanedProducts = [];
products.forEach(function(product){
var cleanedProduct = {};
cleanedProduct.price = product.price.trim();
//etc
cleanedProducts.push(cleanedProduct)
});
//write out results.json 'manually'
fs.writeFile('results.json', JSON.stringify(cleanedProducts));
})

Related Links

Web Scraping: Select drop down issue
X Ray Scraper: Manipulate data before .write
Web Scraping - Google Map Website - is it possible to scrape?
Unexpectedly, every other value is being saved twice in loop
pinyin in google translate API

Categories

HOME
amazon-web-services
firebase
mpdf
dafny
whmcs
malware
moodle-api
tibco
sentry
bellman-ford
wine
migrate
rangy
supervisord
bnf
google-tasks-api
vuforia
ratio
pubxml
fstar
sap-lumira
tf-idf
materialize
toad
opnet
mongodb-3.4
xcrun
winrm
java-stream
db2-luw
landscape
telephonymanager
slideshow
secure-coding
word-cloud
spooler
directwrite
brightscript
datediff
gapi
portal
mathjs
dcount
msbuild-4.0
jtag
highlighting
univocity
nastran
screen-orientation
supercomputers
postal-code
fantom
textscan
maven-shade-plugin
nanomsg
sql-processor
jcheckbox
google-hangouts
uno
mfmessagecomposeview
git-reset
tabletools
cloo
boinc
notifyjs
ofstream
xmlwriter
onconfigurationchanged
spark-view-engine
sigar
representation
botnet
aptitude
aspbutton
google-translator-toolkit
data-retrieval
ning

Resources

Encrypt Message