Results 1 to 15 of 15

Thread: HTML gubbins stripper?

  1. #1
    Mike Fishcake
    Guest

    HTML gubbins stripper?

    Can anyone recommend any decent HTML clean up programs to get rid of the HTML nonsense that MS Word inserts? I've got a few word docs that need HTMLification, and it just puts loads of unnecessary code in there.

  2. #2
    Senior Member Kezzer's Avatar
    Join Date
    Sep 2003
    Posts
    4,863
    Thanks
    12
    Thanked
    5 times in 5 posts
    How would a program know what to get rid of and what to keep?

  3. #3
    Mike Fishcake
    Guest
    It would need to have a telepathic link into my brain.

    :-P

    What I mean is, converting something that converts this, outputted from word:

    Code:
    <html>
    
    <head>
    <meta http-equiv=Content-Type content="text/html; charset=windows-1252">
    <meta name=Generator content="Microsoft Word 11 (filtered)">
    <title>This is an html test page for word</title>
    <style>
    <!--
     /* Style Definitions */
     p.MsoNormal, li.MsoNormal, div.MsoNormal
    	{margin:0cm;
    	margin-bottom:.0001pt;
    	font-size:12.0pt;
    	font-family:"Times New Roman";}
    @page Section1
    	{size:612.0pt 792.0pt;
    	margin:39.7pt 39.7pt 39.7pt 39.7pt;}
    div.Section1
    	{page:Section1;}
    -->
    </style>
    
    </head>
    
    <body lang=EN-US>
    
    <div class=Section1>
    
    <p class=MsoNormal><b><span lang=EN-GB>This</span></b><span lang=EN-GB> is an </span><span
    lang=EN-GB style='font-size:14.0pt'>html</span><span lang=EN-GB> test page for <span
    style='color:red'>word</span>.</span></p>
    
    </div>
    
    </body>
    
    </html>
    to something like this:

    Code:
    <html>
    
    <head>
    <meta http-equiv=Content-Type content="text/html; charset=windows-1252">
    <title>This is an html test page for word</title>
    </head>
    
    <p><b>This</b>is an <font size="4">html</font>test page for <font color="#FF0000">word</font>.</p>
    
    </body>
    
    </html>
    or something similar. It doesn't have to be as complex as that, i'm just showing that to illustrate the point.

  4. #4
    Sublime HEXUS.net
    Join Date
    Jul 2003
    Location
    The Void.. Floating
    Posts
    11,819
    Thanks
    213
    Thanked
    233 times in 160 posts
    • Stoo's system
      • Motherboard:
      • Mac Pro
      • CPU:
      • 2*Xeon 5450 @ 2.8GHz, 12MB Cache
      • Memory:
      • 32GB 1600MHz FBDIMM
      • Storage:
      • ~ 2.5TB + 4TB external array
      • Graphics card(s):
      • ATI Radeon HD 4870
      • Case:
      • Mac Pro
      • Operating System:
      • OS X 10.7
      • Monitor(s):
      • 24" Samsung 244T Black
      • Internet:
      • Zen Max Pro
    Because it's cleverer than you

    Fishcake: Wordcleaner - http://www.zapadoo.com/wordcleaner/index.htm or http://textism.com/wordcleaner/

    for a start
    (\__/)
    (='.'=)
    (")_(")

  5. #5
    Sublime HEXUS.net
    Join Date
    Jul 2003
    Location
    The Void.. Floating
    Posts
    11,819
    Thanks
    213
    Thanked
    233 times in 160 posts
    • Stoo's system
      • Motherboard:
      • Mac Pro
      • CPU:
      • 2*Xeon 5450 @ 2.8GHz, 12MB Cache
      • Memory:
      • 32GB 1600MHz FBDIMM
      • Storage:
      • ~ 2.5TB + 4TB external array
      • Graphics card(s):
      • ATI Radeon HD 4870
      • Case:
      • Mac Pro
      • Operating System:
      • OS X 10.7
      • Monitor(s):
      • 24" Samsung 244T Black
      • Internet:
      • Zen Max Pro
    (\__/)
    (='.'=)
    (")_(")

  6. #6
    Mike Fishcake
    Guest
    Cheers stoo, i'll check those out!!

  7. #7
    Dark side super agent
    Join Date
    Dec 2003
    Location
    Nirvana
    Posts
    1,895
    Thanks
    72
    Thanked
    99 times in 89 posts
    If you've got Dreamweaver then it has a built in HTML stripper designed specifically to take out all the crap that Word puts in.

  8. #8
    Senior Members' Member Matt1eD's Avatar
    Join Date
    Feb 2005
    Location
    London
    Posts
    2,462
    Thanks
    0
    Thanked
    0 times in 0 posts
    • Matt1eD's system
      • Motherboard:
      • MSI K9N6SGM-V GeForce 6100
      • CPU:
      • Athlon 64 LE-1620 2.41GHz
      • Memory:
      • 2 GB DDR2
      • Storage:
      • 1.25 TB
      • Graphics card(s):
      • Onboard
      • PSU:
      • eBuyer Extra Value 500W!
      • Operating System:
      • XP Pro
    Didn't know Dreamweaver did that. Haven't used Word for web design for ages though.

  9. #9
    HEXUS.net Webmaster
    Join Date
    Jul 2003
    Location
    UK
    Posts
    3,108
    Thanks
    1
    Thanked
    0 times in 0 posts
    Don't use Word for web design ever.....

  10. #10
    Mike Fishcake
    Guest
    Don't have dreamweaver unfortunately... I don't design pages in word, but it was for transferring a document that was already in word format into HTML.

    All done now anyway

  11. #11
    Senior Member
    Join Date
    Jul 2003
    Posts
    1,066
    Thanks
    1
    Thanked
    0 times in 0 posts
    Quote Originally Posted by Iain
    Don't use Word for web design ever.....
    Got there before me!

    Translating something from word? Not a task I'd relish

  12. #12
    Sublime HEXUS.net
    Join Date
    Jul 2003
    Location
    The Void.. Floating
    Posts
    11,819
    Thanks
    213
    Thanked
    233 times in 160 posts
    • Stoo's system
      • Motherboard:
      • Mac Pro
      • CPU:
      • 2*Xeon 5450 @ 2.8GHz, 12MB Cache
      • Memory:
      • 32GB 1600MHz FBDIMM
      • Storage:
      • ~ 2.5TB + 4TB external array
      • Graphics card(s):
      • ATI Radeon HD 4870
      • Case:
      • Mac Pro
      • Operating System:
      • OS X 10.7
      • Monitor(s):
      • 24" Samsung 244T Black
      • Internet:
      • Zen Max Pro
    Quote Originally Posted by RoGuE|SaBeR
    Translating something from word? Not a task I'd relish
    I've had to do that several times, mainly HR documents that need converting before they're put up on the intranet at work...

    Takes *ages* even with a word-cleaner..
    (\__/)
    (='.'=)
    (")_(")

  13. #13
    Ah, Mrs. Peel! mike_w's Avatar
    Join Date
    Oct 2003
    Location
    Hertfordshire, England
    Posts
    3,326
    Thanks
    3
    Thanked
    9 times in 7 posts
    If you use Linux, I've found that Abiword does a good job of converting text to XHTML. I haven't used it extensively, but it is certainly better than Word (although I still use a text editor to do it all)
    "Well, there was your Uncle Tiberius who died wrapped in cabbage leaves but we assumed that was a freak accident."

  14. #14
    lazy student nvening's Avatar
    Join Date
    Jan 2005
    Location
    London
    Posts
    4,656
    Thanks
    196
    Thanked
    31 times in 30 posts
    Why would anyone build a site with anything else than notepad or equivelent?
    ITS CRAZY TALK!!!!!!!!!!!!!!
    (\__/)
    (='.'=)
    (")_(")

  15. #15
    Ah, Mrs. Peel! mike_w's Avatar
    Join Date
    Oct 2003
    Location
    Hertfordshire, England
    Posts
    3,326
    Thanks
    3
    Thanked
    9 times in 7 posts
    Quote Originally Posted by nvening
    Why would anyone build a site with anything else than notepad or equivelent?
    ITS CRAZY TALK!!!!!!!!!!!!!!
    Because they don't want to learn HTML? Because they don't have the time? Because they can't learn HTML? If you do it all yourself, things can go wrong quite easily, and things can happen that you didn't expect, and take up a lot of time to solve or correct. Some people just want to build a website that works, in which case a WYSIWYG editor comes in very handy!
    "Well, there was your Uncle Tiberius who died wrapped in cabbage leaves but we assumed that was a freak accident."

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. HTML / Resolution Query
    By Retrox in forum Software
    Replies: 5
    Last Post: 18-03-2005, 02:35 PM
  2. Smart FTP Q? + html tutor site?
    By Jeanne d'Arc in forum Software
    Replies: 2
    Last Post: 04-12-2004, 12:36 AM
  3. Replies: 10
    Last Post: 09-01-2004, 07:19 PM
  4. Moving on from HTML...
    By TomWilko in forum Software
    Replies: 21
    Last Post: 23-10-2003, 10:17 PM
  5. Replies: 1
    Last Post: 14-08-2003, 03:32 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •