Results 1 to 10 of 10

Thread: curl script for downloading web pages

  1. #1
    Senior Member watercooled's Avatar
    Join Date
    Jan 2009
    Posts
    11,459
    Thanks
    1,539
    Thanked
    1,022 times in 868 posts

    curl script for downloading web pages

    I've not used curl in this way before so I could use a little help. Basically I want to download some podcast transcripts (and I want to learn how to make scripts for this program) but instead of going through them downloading them all I'm hoping to write a script for curl to download them all in one go. So lets say they are located at http://exampleurl.com/transcript0001.html to http://exampleurl.com/transcript0200.html
    Thanks!

  2. #2
    cat /dev/null streetster's Avatar
    Join Date
    Jul 2003
    Location
    London
    Posts
    4,138
    Thanks
    119
    Thanked
    100 times in 82 posts
    • streetster's system
      • Motherboard:
      • Asus P7P55D-E
      • CPU:
      • Intel i5 750 2.67 @ 4.0Ghz
      • Memory:
      • 4GB Corsair XMS DDR3
      • Storage:
      • 2x1TB Drives [RAID0]
      • Graphics card(s):
      • 2xSapphire HD 4870 512MB CrossFireX
      • PSU:
      • Corsair HX520W
      • Case:
      • Coolermaster Black Widow
      • Operating System:
      • Windows 7 x64
      • Monitor(s):
      • DELL U2311
      • Internet:
      • Virgin 50Mb

    Re: curl script for downloading web pages

    cant you stick a wget command in a for loop?

    for (i = 0; i< 200; i++)
    wget [url]$i[.html]

  3. Received thanks from:

    watercooled (09-10-2009)

  4. #3
    Gentoo Ricer
    Join Date
    Jan 2005
    Location
    Galway
    Posts
    11,048
    Thanks
    1,016
    Thanked
    944 times in 704 posts
    • aidanjt's system
      • Motherboard:
      • Asus Strix Z370-G
      • CPU:
      • Intel i7-8700K
      • Memory:
      • 2x8GB Corsiar LPX 3000C15
      • Storage:
      • 500GB Samsung 960 EVO
      • Graphics card(s):
      • EVGA GTX 970 SC ACX 2.0
      • PSU:
      • EVGA G3 750W
      • Case:
      • Fractal Design Define C Mini
      • Operating System:
      • Windows 10 Pro
      • Monitor(s):
      • Asus MG279Q
      • Internet:
      • 240mbps Virgin Cable

    Re: curl script for downloading web pages

    Yup, no reason why not.
    Quote Originally Posted by Agent View Post
    ...every time Creative bring out a new card range their advertising makes it sound like they have discovered a way to insert a thousand Chuck Norris super dwarfs in your ears...

  5. Received thanks from:

    watercooled (09-10-2009)

  6. #4
    Senior Member watercooled's Avatar
    Join Date
    Jan 2009
    Posts
    11,459
    Thanks
    1,539
    Thanked
    1,022 times in 868 posts

    Re: curl script for downloading web pages

    Yeah I didn't think of that TBH, thanks!

  7. #5
    Senior Member watercooled's Avatar
    Join Date
    Jan 2009
    Posts
    11,459
    Thanks
    1,539
    Thanked
    1,022 times in 868 posts

    Re: curl script for downloading web pages

    Only just tried this now and for some reason I'm getting "i was unexpected at this time". What am I doing wrong?

  8. #6
    cat /dev/null streetster's Avatar
    Join Date
    Jul 2003
    Location
    London
    Posts
    4,138
    Thanks
    119
    Thanked
    100 times in 82 posts
    • streetster's system
      • Motherboard:
      • Asus P7P55D-E
      • CPU:
      • Intel i5 750 2.67 @ 4.0Ghz
      • Memory:
      • 4GB Corsair XMS DDR3
      • Storage:
      • 2x1TB Drives [RAID0]
      • Graphics card(s):
      • 2xSapphire HD 4870 512MB CrossFireX
      • PSU:
      • Corsair HX520W
      • Case:
      • Coolermaster Black Widow
      • Operating System:
      • Windows 7 x64
      • Monitor(s):
      • DELL U2311
      • Internet:
      • Virgin 50Mb

    Re: curl script for downloading web pages

    what shell are you using?

    something like this should work for cshell:

    Code:
    set i = 1
    set url = "http://www.domain.com/the-url-"
    set html = ".html"
     while ($i < 200)
       wget $url$i$html
       @ index++
     end

  9. Received thanks from:

    watercooled (12-10-2009)

  10. #7
    YUKIKAZE arthurleung's Avatar
    Join Date
    Feb 2005
    Location
    Aberdeen
    Posts
    3,280
    Thanks
    8
    Thanked
    88 times in 83 posts
    • arthurleung's system
      • Motherboard:
      • Asus P5E (Rampage Formula 0902)
      • CPU:
      • Intel Core2Quad Q9550 3.6Ghz 1.2V
      • Memory:
      • A-Data DDR2-800 2x2GB CL4
      • Storage:
      • 4x1TB WD1000FYPS @ RAID5 3Ware 9500S-8 / 3x 1TB Samsung Ecogreen F2
      • Graphics card(s):
      • GeCube HD4870 512MB
      • PSU:
      • Corsair VX450
      • Case:
      • Antec P180
      • Operating System:
      • Windows Server 2008 Standard
      • Monitor(s):
      • Dell Ultrasharp 2709W + 2001FP
      • Internet:
      • Be*Unlimited 20Mbps

    Re: curl script for downloading web pages

    If I was doing it, I would just use Emeditor's find function to pull out all instances of
    Code:
    (http://.*?\.html)
    then replace with
    Code:
    wget \1
    and copy it to the command window

    That is if you have a random list of urls. Otherwise if its a numbered list I'll just use excel to make the list.
    My favourite combo, Excel + Emeditor. Instead of writing a program to crunch data, just a few tricks with this combo will do the same job in 1/10 the time.
    Workstation 1: Intel i7 950 @ 3.8Ghz / X58 / 12GB DDR3-1600 / HD4870 512MB / Antec P180
    Workstation 2: Intel C2Q Q9550 @ 3.6Ghz / X38 / 4GB DDR2-800 / 8400GS 512MB / Open Air
    Workstation 3: Intel Xeon X3350 @ 3.2Ghz / P35 / 4GB DDR2-800 / HD4770 512MB / Shuttle SP35P2
    HTPC: AMD Athlon X4 620 @ 2.6Ghz / 780G / 4GB DDR2-1000 / Antec Mini P180 White
    Mobile Workstation: Intel C2D T8300 @ 2.4Ghz / GM965 / 3GB DDR2-667 / DELL Inspiron 1525 / 6+6+9 Cell Battery

    Display (Monitor): DELL Ultrasharp 2709W + DELL Ultrasharp 2001FP
    Display (Projector): Epson TW-3500 1080p
    Speakers: Creative Megaworks THX550 5.1
    Headphones: Etymotic hf2 / Ultimate Ears Triple.fi 10 Pro

    Storage: 8x2TB Hitachi @ DELL PERC 6/i RAID6 / 13TB Non-RAID Across 12 HDDs
    Consoles: PS3 Slim 120GB / Xbox 360 Arcade 20GB / PS2

  11. Received thanks from:

    watercooled (12-10-2009)

  12. #8
    Senior Member watercooled's Avatar
    Join Date
    Jan 2009
    Posts
    11,459
    Thanks
    1,539
    Thanked
    1,022 times in 868 posts

    Re: curl script for downloading web pages

    Sorry but again I've not got round to trying this yet. I've tried using Excel to make the lists but if I copy paste the first few then try to drag the list down it just repeats them instead of incrementing the number before .htm - is there something else I should be doing? Probably something really simple I'm missing TBH...
    Thanks again!!!

  13. #9
    YUKIKAZE arthurleung's Avatar
    Join Date
    Feb 2005
    Location
    Aberdeen
    Posts
    3,280
    Thanks
    8
    Thanked
    88 times in 83 posts
    • arthurleung's system
      • Motherboard:
      • Asus P5E (Rampage Formula 0902)
      • CPU:
      • Intel Core2Quad Q9550 3.6Ghz 1.2V
      • Memory:
      • A-Data DDR2-800 2x2GB CL4
      • Storage:
      • 4x1TB WD1000FYPS @ RAID5 3Ware 9500S-8 / 3x 1TB Samsung Ecogreen F2
      • Graphics card(s):
      • GeCube HD4870 512MB
      • PSU:
      • Corsair VX450
      • Case:
      • Antec P180
      • Operating System:
      • Windows Server 2008 Standard
      • Monitor(s):
      • Dell Ultrasharp 2709W + 2001FP
      • Internet:
      • Be*Unlimited 20Mbps

    Re: curl script for downloading web pages

    Quote Originally Posted by watercooled View Post
    Sorry but again I've not got round to trying this yet. I've tried using Excel to make the lists but if I copy paste the first few then try to drag the list down it just repeats them instead of incrementing the number before .htm - is there something else I should be doing? Probably something really simple I'm missing TBH...
    Thanks again!!!
    Unfortunately excel is not as smart as you think it is.

    Say you have the url as
    Code:
    http:///website.com/whateverfolder/abc001.html
    http:///website.com/whateverfolder/abc002.html
    http:///website.com/whateverfolder/abc003.html
    You can do it a couple ways. Two of the easier ways that I would use if I'm doing it:

    1. Split the url into "http://website.com/whateverfolder/abc" "001" ".html" in 3 columns in excel, then increment column B, set the format to custom, "000". After you're done, copy the whole list to notepad (or any of your favorite editor), do a search and replace <tab> to nothing. Then search "http" replace with "wget http"

    This is basically giving just giving excel a hand...

    2. make a quick list of number in excel, say 1-200, do that format to "000", use a decent text editor, copy the list to the editor, then perform a regexp

    Find
    Code:
    ^(.*?)$
    Replace with
    Code:
    wget http://website.com/whateverfolder/abc\1.html
    Workstation 1: Intel i7 950 @ 3.8Ghz / X58 / 12GB DDR3-1600 / HD4870 512MB / Antec P180
    Workstation 2: Intel C2Q Q9550 @ 3.6Ghz / X38 / 4GB DDR2-800 / 8400GS 512MB / Open Air
    Workstation 3: Intel Xeon X3350 @ 3.2Ghz / P35 / 4GB DDR2-800 / HD4770 512MB / Shuttle SP35P2
    HTPC: AMD Athlon X4 620 @ 2.6Ghz / 780G / 4GB DDR2-1000 / Antec Mini P180 White
    Mobile Workstation: Intel C2D T8300 @ 2.4Ghz / GM965 / 3GB DDR2-667 / DELL Inspiron 1525 / 6+6+9 Cell Battery

    Display (Monitor): DELL Ultrasharp 2709W + DELL Ultrasharp 2001FP
    Display (Projector): Epson TW-3500 1080p
    Speakers: Creative Megaworks THX550 5.1
    Headphones: Etymotic hf2 / Ultimate Ears Triple.fi 10 Pro

    Storage: 8x2TB Hitachi @ DELL PERC 6/i RAID6 / 13TB Non-RAID Across 12 HDDs
    Consoles: PS3 Slim 120GB / Xbox 360 Arcade 20GB / PS2

  14. Received thanks from:

    watercooled (20-10-2009)

  15. #10
    Senior Member watercooled's Avatar
    Join Date
    Jan 2009
    Posts
    11,459
    Thanks
    1,539
    Thanked
    1,022 times in 868 posts

    Re: curl script for downloading web pages

    Thanks very much - cracked it

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. AOL won't display certain web pages
    By Zak33 in forum Networking and Broadband
    Replies: 8
    Last Post: 02-09-2009, 10:16 AM
  2. After gaming my web pages are flickery?
    By qu0th in forum Help! Quick Relief From Tech Headaches
    Replies: 2
    Last Post: 26-06-2006, 03:36 AM
  3. Does Firefox encrypt passwords for web forms / web pages?
    By davidstone28 in forum Help! Quick Relief From Tech Headaches
    Replies: 2
    Last Post: 19-01-2006, 10:11 PM
  4. Nasty Pop up's and extra web pages!!
    By Jimmy Little in forum Software
    Replies: 13
    Last Post: 08-04-2004, 10:19 PM
  5. web pages!
    By jester212 in forum PC Hardware and Components
    Replies: 2
    Last Post: 01-03-2004, 01:05 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •