scrape


Unexpectedly, every other value is being saved twice in loop


I am attempting to scrape the price of items from a website, extracting these values and saving them to CSV, all using iMacros.
I have been successful in creating a looped extraction and save macro, however I am receiving unexpected results. Every second value is being saved twice in the resulting CSV file.
My macro code is as follows:
VERSION BUILD=10022823
TAB T=1
TAB CLOSEALLOTHERS
SET !ERRORIGNORE YES
SET !LOOP 1
SET !DATASOURCE C:\Users\UserName\Documents\URL_List.csv
SET !DATASOURCE_COLUMNS 1
SET !DATASOURCE_LINE {{!LOOP}}
URL GOTO={{!COL1}}
SET !TIMEOUT_STEP 10
TAG POS=1 TYPE=DIV ATTR=CLASS:preis EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=Extracted_prices.csv
My expected result would be:
$99.99
$89.99
$16.99
What I am getting is instead:
$99.99
$99.99
89.99
$16.99
$16.99
I can't for the life of me figure out why this behavior is occurring. I have consulted the available documentation on the iMacros Wiki to no avail. There are a multitude of existing questions here on stackoverflow that address the creation of data scraping and extraction macros, many of which I have consulted in the creation of the above macro. Nevertheless, I was unable to find anyone experiencing the same difficulty. I also checked the integrity of my CSV file to make sure there hadn't been any errors in it's creation, but I was unable to find any irregularities. Am I just missing something quite simple?
VERSION BUILD=10022823
TAB T=1
TAB CLOSEALLOTHERS
SET !ERRORIGNORE YES
SET !LOOP 1
SET !DATASOURCE C:\Users\UserName\Documents\URL_List.csv
SET !DATASOURCE_COLUMNS 1
SET !DATASOURCE_LINE {{!LOOP}}
URL GOTO={{!COL1}}
SET !TIMEOUT_STEP 10
TAG POS=1 TYPE=DIV ATTR=CLASS:preis EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=Extracted_prices.csv
SET !EXTRACT NULL
Maybe you should clear the value of Extract variable.
In case my comment gets cleaned up, I'll post my less-than-sufficient "answer".
After toying with this macro and pulling out my hair for a few hours, I switched over from the Internet Explorer extension to the iMacros extension for Firefox et violĂ , everything worked as expected. However, after running the macro once I received Error code -1001 upon trying to run. Apparently, this error is caused when the CSV file referenced in !DATASOURCE is improperly encoded. CSV files must apparently be saved with UTF-8 encoding.
It's perhaps a long shot, but it is possible that the unexpected results I was receiving in the IE extension were related to encoding. Edit: A quick test resulted in the same output from the IE extension, even after encoding the source CSV as UTF-8.
Hopefully this can be of help to some poor soul!

Related Links

Web Scraping: Select drop down issue
X Ray Scraper: Manipulate data before .write
Web Scraping - Google Map Website - is it possible to scrape?
Unexpectedly, every other value is being saved twice in loop
pinyin in google translate API

Categories

HOME
android-espresso
apple-push-notifications
compilation
dry
code-formatting
websocket
mosquitto
impala
haxe
supercollider
joomla3.2
native-base
mutation-testing
clickable-image
driver
twitter-oauth
slide
rangy
trace32
fresco
phonegap-build
closures
checkout
codefluent
pfobject
clr
activeadmin
batch-rename
backendless
quartz.net
pyopencl
environment
jpa-2.1
cellular-network
webviewclient
history.js
sales
infopath2010
reporting
delayed-job
xcrun
epplus
knights-tour
sql-execution-plan
apptentive
cmis
xerces-c
landscape
runtimeexception
ms-dos
strapi
django-smart-selects
distributed-caching
optionaldataexception
contactless-smartcard
asihttprequest
audio-converter
dredd
sonarqube5.2
cyanogenmod
md5-file
amazon-elastic-beanstalk
freshdesk
glimpse
mpmovieplayercontroller
business-rules
ftp-client
spring-mongo
unsigned
uibinder
timestamping
android-async-http
teamviewer
preferenceactivity
flipclock
wikimedia-commons
teensy
ui4j
largenumber
mousehover
chicagoboss
highest
enaml
unidata
physicsjs
innerhtml
visual-studio-6
full-text-indexing
coding-efficiency
android-2.3-gingerbread
jchartfx
nosql-aggregation
application-blocks
smips
ggts
rautomation
form-for
pacman
asp.net-mvc-3-areas
red-system
smartfox
image-scanner
type-safety
quartz-core
kext
css-friendly
qcar-sdk
hungarian-notation
uninstaller
fgetc
jquery-effects
accumulator
index.dat
suggestbox
gui-designer
service-factory

Resources

Encrypt Message