Page 2 of 3 FirstFirst 123 LastLast
Results 17 to 32 of 44

Thread: Clustering commodity PC hardware - A web log

  1. #17
    OMG!! PWND!!
    Join Date
    Dec 2003
    Location
    In front of computer
    Posts
    964
    Thanks
    0
    Thanked
    0 times in 0 posts
    get it to run pifast

  2. #18
    HEXUS.timelord. Zak33's Avatar
    Join Date
    Jul 2003
    Location
    I'm a Jessie
    Posts
    34,357
    Thanks
    2,628
    Thanked
    2,711 times in 1,706 posts
    • Zak33's system
      • Storage:
      • Kingston HyperX SSD, Hitachi 1Tb
      • Graphics card(s):
      • Nvidia 1060
      • PSU:
      • Coolermaster 800w
      • Case:
      • Silverstone Fortress FT01
      • Operating System:
      • Win10
      • Internet:
      • Zen FTC uber speedy
    right...getting it now(yeah...1% ) So its RAID but..er......RAIC... (if I'm thefirst person to think of that I'd like royalties pls)

    IS the software that runs it self-written, or is their a defacto set of software developers who create it?

    And in either case IS IT a network....CAT5 cable and stuff? Or does it slot together like interconnectabe motherboards? Or is it a kinda "blade" that slots like a daughterboard into a big motherboard?

    I'm well intrigued sounds very cool indeed.

    Quote Originally Posted by Advice Trinity by Knoxville
    "The second you aren't paying attention to the tool you're using, it will take your fingers from you. It does not know sympathy." |
    "If you don't gaffer it, it will gaffer you" | "Belt and braces"

  3. #19
    Banned
    Join Date
    Sep 2003
    Location
    Midlands
    Posts
    8,629
    Thanks
    24
    Thanked
    268 times in 188 posts
    Trying to take me on at my own game I see Rys, geting people invovled with a project and getting them posting, ala the HEXUS living document.

    Rest assured, I shall dig deep and rise to the challenge.



    Seriously though mate, must be enough trouble to do, let alone make posts telling everyone about it. Nice work.

  4. #20
    Rys
    Rys is offline
    Tiled
    Join Date
    Jul 2003
    Location
    Abbots Langley
    Posts
    1,479
    Thanks
    0
    Thanked
    2 times in 1 post

    Thumbs up

    Quote Originally Posted by Zak33
    right...getting it now(yeah...1% ) So its RAID but..er......RAIC... (if I'm thefirst person to think of that I'd like royalties pls)

    IS the software that runs it self-written, or is their a defacto set of software developers who create it?

    And in either case IS IT a network....CAT5 cable and stuff? Or does it slot together like interconnectabe motherboards? Or is it a kinda "blade" that slots like a daughterboard into a big motherboard?

    I'm well intrigued sounds very cool indeed.
    The software can be self written if you know what you're doing and in many cases existing software can be adapted to a clustered computing environment without much bother, too. There are companies and software teams out there that specialise creating applications just for cluster setups, but that doesn't have to be the case.

    Then you can take things like video encoding, something that lends itself well to distribution in a clustered system and that's something that most people can understand. Video encoding can go lots faster on a cluster, distributing the workload to all the systems in the cluster to get your work done faster.

    As far as connecting them goes, my cluster is currently only connected via regular network cable (CAT5) into a 100Mbit Ethernet switch. And the cluster will run just fine over that. It's hardware most people have already.

    I also have a Myrinet interconnect for the compute nodes too, which is much (orders of magnitude!) lower latency than any Ethernet variant, allowing the nodes to talk to each other faster, and at higher bandwidth (Myrinet 1000, which I have, is 1.28Gbit/sec) than even gigabit Ethernet. So my compute nodes, when configured with the Myrinet, will allow them to talk to each other faster and at a higher rate.

    RIAC is a great term to use for home clustering! Cobble together a bunch of older machines, installing an operating system on them that supports clustering and run the applications that can take advantage of it. It doesn't have to be an expensive cluster (like mine, which when first deployed in 1990 cost the wrong side of £50,000), it can just be a bunch of EPIAs or something.

    The thing to remember is that in terms of hardware, there's many ways to create a cluster. Blade servers with a common backplane and chassis, separate machines connected via CAT5 cable and Ethernet, or Myrinet (or Infiniband and loads of other interconnect standards), or what have you.

    Just connect machines really

    It's then the software on top that's the difficult part, which I'll cover in due course with my own cluster.

    Rys
    MOLLY AND POPPY!

  5. #21
    HEXUS webmaster Steve's Avatar
    Join Date
    Nov 2003
    Location
    Bristol
    Posts
    14,268
    Thanks
    286
    Thanked
    828 times in 468 posts
    • Steve's system
      • CPU:
      • Intel i3-350M 2.27GHz
      • Memory:
      • 8GiB Crucial DDR3
      • Storage:
      • 320GB HDD
      • Graphics card(s):
      • Intel HD3000
      • Operating System:
      • Ubuntu 11.10
    I'm just disappointed at the lack of the word "beowulf". I guess I spend far too much time on slashdot.

    Keep up the good work Rys
    PHP Code:
    $s = new signature();
    $s->sarcasm()->intellect()->font('Courier New')->display(); 

  6. #22
    Senior Member
    Join Date
    Jul 2004
    Location
    London
    Posts
    2,456
    Thanks
    100
    Thanked
    75 times in 51 posts
    • Mblaster's system
      • Motherboard:
      • ASUS PK5 Premium
      • CPU:
      • Intel i5 2500K
      • Memory:
      • 8gb DDR3
      • Storage:
      • Intel X25 SSD + WD 2TB HDD
      • Graphics card(s):
      • Nvidia GeForce GTX 570
      • PSU:
      • Corsair HX520
      • Case:
      • Antec P180
      • Operating System:
      • Windows 7 Professional x64
      • Monitor(s):
      • HP w2207 (22" wide)
      • Internet:
      • Rubbish ADSL
    As far as connecting them goes, my cluster is currently only connected via regular network cable (CAT5) into a 100Mbit Ethernet switch .... I also have a Myrinet interconnect for the compute nodes too, which is much (orders of magnitude!) lower latency than any Ethernet variant, allowing the nodes to talk to each other faster, and at higher bandwidth
    So are all they connected by ethernet now and will be using Myrinet when you have set it up? Or are just some of the connections by Myrinet, and the rest ethernet?
    I don't mean to sound cold, or cruel, or vicious, but I am so that's the way it comes out.

  7. #23
    Rys
    Rys is offline
    Tiled
    Join Date
    Jul 2003
    Location
    Abbots Langley
    Posts
    1,479
    Thanks
    0
    Thanked
    2 times in 1 post

    Thumbs up

    Quote Originally Posted by Mblaster
    So are all they connected by ethernet now and will be using Myrinet when you have set it up? Or are just some of the connections by Myrinet, and the rest ethernet?
    You were right the first time So...

    They're all connected via Ethernet atm, which will be a permanent fixture since the front-end has no Myrinet and has to issue jobs and the like over Ethernet. However the compute nodes have Myrinet, which when configured will be the interface they use to talk to each other when jobs are running. The Myrinet is idle just now though, unconfigured so far.

    So just now, if I send out a job, it goes out over Ethernet, is computed using Ethernet as the transport, and comes back to the front-end using Ethernet.

    In the future it'll be out over Ethernet, compute over Myrinet, back over Ethernet when the job is done.

    Rys
    MOLLY AND POPPY!

  8. #24
    Rys
    Rys is offline
    Tiled
    Join Date
    Jul 2003
    Location
    Abbots Langley
    Posts
    1,479
    Thanks
    0
    Thanked
    2 times in 1 post

    Lightbulb January 25, 2005

    The final four chodenodes are built

    The final four chodenodes, configured in Rocks as a separate rack cabinet since they're in two piles of four, got built tonight. There was an issue with the front-end dropping the connection that the chodenodes pulled their install images from, but I got there in the end.

    I'll boot the full cluster with all nodes attached at lunchtime tomorrow (today, since it's 3.26am), so expect another update soon after with an obligatory Ganglia screenshot to celebrate bringing all CPUs online.

    What I haven't done so far is detail what software is being used, how it was setup on the front-end and how you bootstrap the compute nodes over the network using the front-end, so I'll do a series of posts sometime this week detailing that. It'll be much what you'll find in the Rocks base install guide, with insight to match my particular cluster setup.

    So that's milestone number one reached: the front-end machine and all the compute nodes are setup and can talk to each other correctly over Ethernet.

    Milestone two will be to have them running simultaneously, with milestone three being able successfully run jobs over configured Myrinet.

    Posted by Rys at January 25, 2005 03:24 AM
    Last edited by Rys; 25-01-2005 at 04:33 AM. Reason: Fixing titles
    MOLLY AND POPPY!

  9. #25
    Registered User
    Join Date
    Jan 2005
    Posts
    1
    Thanks
    0
    Thanked
    0 times in 0 posts
    Nice work Rys! I'm good friends with The Tim and have experience with Win2k clusters but very little with Linux - I'll be hawking this thread closely

  10. #26
    Rys
    Rys is offline
    Tiled
    Join Date
    Jul 2003
    Location
    Abbots Langley
    Posts
    1,479
    Thanks
    0
    Thanked
    2 times in 1 post

    Lightbulb January 25, 2005

    Milestone two reached; all 18 CPUs connected

    I switched the front-end on during lunch to have a quick look at the Myrinet configuration (the kernel isn't loading the Myrinet module yet) before turning on the two banks of nodes. Everything came up just fine and the end result brought a smile to my face. All eighteeen processors (sixteen for the compute nodes and two for the front-end) can be seen and the graphs at the top of that Ganglia display show you just how I booted it.

    If you look at the memory graph which I've overlayed with a few labels, you can see me bring up the front-end which registers itself and says hello to Ganglia, then I bring up the first 'cabinet' of compute nodes a short while after, when I was done playing with the Myrinet config, then I bring up the second cabinet of nodes after the first four are showing in Ganglia.

    That information is shown properly in the load graph too, I just used the memory size graph since it had less metrics to look at. So in that graph, CPU count starts at 2, rises to 10, then to 18 (red metric). You can see the same trend in the node count metric (green). Notice how the process count spikes as each node is switched on, as they talk to the front-end to say hello and join the cluster.

    Myrinet configuration is next, sometime this week or at the weekend.

    Posted by Rys at January 25, 2005 02:09 PM
    MOLLY AND POPPY!

  11. #27
    Rys
    Rys is offline
    Tiled
    Join Date
    Jul 2003
    Location
    Abbots Langley
    Posts
    1,479
    Thanks
    0
    Thanked
    2 times in 1 post

    Lightbulb January 26, 2005

    99% of the way to milestone three, Myrinet is up (just)

    Tim and I successfully brought up the Myrinet interconnect on superchode tonight. Rocks gets you most of the way there on a default 3.3.0 install, but not quite. Instead of quickly fixing the default GM (Myricom's message passing layer for Myrinet) install supplied with Rocks, I decided to self-upgrade to 2.0.17 instead, which is the current release.

    I did the initial testing and bootstrap of the GM mapper - software which controls the Myrinet routing amongst other things - on chodenode-0-0 (the first one in cabinet 0) which went fine, so Tim and I deployed it out onto a cabinet each, using NFS stores for common data. I'll document that process in due course. chodenode-1-0 has a slight cabling issue (masses of CRC errors logged by the hardware and no route to the mapper running on 0-1) which seems to be fixed now, but I'll be investigating getting some spare cables, just incase.

    It all looks good now anyway. In terms of milestones, we didn't quite get round to pushing a job out over the cluster when the Myrinet was up, due to the problems with 1-0 and the time it took (it's now nearly 2am and Alex will kill me when I get to bed), so milestone three isn't officially met, but I'm sure we'll get there tomorrow when I bring it back up either at lunchtime if I'm not too busy (I probably will be) or at night.

    In terms of performance, GM's benchmarking tools showed very low latency (data being moved in periods measured in the low numberrs of microseconds) across seven hosts earlier, and an excess of 2Gbit/sec of bandwidth both reading and writing, per node. Inter-node bandwidth with all nodes should therefore be in the region of 16Gbit/sec.

    Posted by Rys at January 26, 2005 01:45 AM
    Last edited by Rys; 26-01-2005 at 03:00 AM. Reason: Forgot the post title
    MOLLY AND POPPY!

  12. #28
    Rys
    Rys is offline
    Tiled
    Join Date
    Jul 2003
    Location
    Abbots Langley
    Posts
    1,479
    Thanks
    0
    Thanked
    2 times in 1 post

    Lightbulb February 07, 2005

    Rebuild success; milestone three reached

    Tim and I spent the day rebuilding superchode to add some new Rolls to the cluster (Rolls are bundles of files and configuration data used to add functionality to a Rocks cluster). It's now in much better shape to be used for useful work, with clustered Java, C, C++ and Fortran implementations available for coding with, and their requisite libs for precompiled stuff.

    Sun's Grid Engine is the default scheduler now too. You submit jobs to the SGE queue and it sends them out to the required nodes for processing. There's a bunch of queue tools for monitoring, and there's a web queue monitor if we're feeling lazy and can't be bothered SSH'ing in to the front-end machine.

    The rebuild was made painful by an automount issue on the front-end (automount is the daemon that manages NFS shares for the home directories of users on the cluster) and the requirement for more swap space than we'd originally allocated, meaning that the first attempted rebuild fell over during the configuration phase.

    However it's all up and running again, Myrinet is up and jobs can be scheduled over GM on the compute nodes. In short, the cluster is fully operational and better equipped to do some real work.

    Milestone three was reached with a successful run of Linpack over Myrinet and two compute nodes, just before I shut it down for the night.

    Milestone four will be the completion of useful work using the cluster. More on that after I rope Tom into things.

    Posted by Rys at February 7, 2005 01:16 AM
    MOLLY AND POPPY!

  13. #29
    Senior Members' Member Matt1eD's Avatar
    Join Date
    Feb 2005
    Location
    London
    Posts
    2,462
    Thanks
    0
    Thanked
    0 times in 0 posts
    • Matt1eD's system
      • Motherboard:
      • MSI K9N6SGM-V GeForce 6100
      • CPU:
      • Athlon 64 LE-1620 2.41GHz
      • Memory:
      • 2 GB DDR2
      • Storage:
      • 1.25 TB
      • Graphics card(s):
      • Onboard
      • PSU:
      • eBuyer Extra Value 500W!
      • Operating System:
      • XP Pro
    pity ur selling it

    is there like a simple way to get that sort of power shared? Like a rendering farm

  14. #30
    Vive le pants! directhex's Avatar
    Join Date
    Jul 2003
    Location
    /dev/urandom
    Posts
    17,074
    Thanks
    228
    Thanked
    1,027 times in 678 posts
    • directhex's system
      • Motherboard:
      • MSI X99A Gaming 7
      • CPU:
      • Intel Core i7 5280k
      • Memory:
      • 32GiB ADATA DDR4
      • Storage:
      • Corsair Neutron XT 960GB
      • Graphics card(s):
      • MSI GTX 980 Gaming 4G Twin Frozr 5
      • PSU:
      • Corsair AX860i
      • Case:
      • NZXT H440
      • Operating System:
      • Ubuntu 17.10, Windows 10
      • Monitor(s):
      • Dell U2713HM
      • Internet:
      • FIOS
    Quote Originally Posted by Matt1eD
    is there like a simple way to get that sort of power shared? Like a rendering farm
    what do you mean by "shared", and what's your understanding of the words "rendering farm"?

  15. #31
    Senior Members' Member Matt1eD's Avatar
    Join Date
    Feb 2005
    Location
    London
    Posts
    2,462
    Thanks
    0
    Thanked
    0 times in 0 posts
    • Matt1eD's system
      • Motherboard:
      • MSI K9N6SGM-V GeForce 6100
      • CPU:
      • Athlon 64 LE-1620 2.41GHz
      • Memory:
      • 2 GB DDR2
      • Storage:
      • 1.25 TB
      • Graphics card(s):
      • Onboard
      • PSU:
      • eBuyer Extra Value 500W!
      • Operating System:
      • XP Pro
    Quote Originally Posted by directhex
    what do you mean by "shared", and what's your understanding of the words "rendering farm"?
    shared - like my computer connected to loads of others over some sort of network switch do do more work for me.

    rendering farm - a computer cluser thing that like has everything split up (like images of a film) between the processors of the farm's other computers. No doubt am wrong but a big version of what a dual processor mobo does

  16. #32
    Vive le pants! directhex's Avatar
    Join Date
    Jul 2003
    Location
    /dev/urandom
    Posts
    17,074
    Thanks
    228
    Thanked
    1,027 times in 678 posts
    • directhex's system
      • Motherboard:
      • MSI X99A Gaming 7
      • CPU:
      • Intel Core i7 5280k
      • Memory:
      • 32GiB ADATA DDR4
      • Storage:
      • Corsair Neutron XT 960GB
      • Graphics card(s):
      • MSI GTX 980 Gaming 4G Twin Frozr 5
      • PSU:
      • Corsair AX860i
      • Case:
      • NZXT H440
      • Operating System:
      • Ubuntu 17.10, Windows 10
      • Monitor(s):
      • Dell U2713HM
      • Internet:
      • FIOS
    well, that's what confused me - a cluster is designed for precisely that!

    you connect to a "front end" or "master" node, and submit a "job" (usually a script detailing work to be done) using a scheduler - Sun GridEngine is a common scheduler, as is PBSPro or Torque. The scheduler then sends your job script to X number of machines in a cluster (as specified when you ran the submission), and lets the job run to either competion, or until a time limit has been reached.

    What you need though are programs designed to be run on a cluster - more specifically, they need to use MPI (Message Passing Interface) to send messages to each other. In this case, SuperChode has a dedicated Myrinet (low latency) network interconnect to do message passing, as well as regular ethernet for file transfer et al. Using a myrinet-capable application (or compiling one on the master node) means you can pslit the job up at will

    Alternatively, you could submit sixteen 1-cpu jobs with their own submissions scripts and no need to interoperate. the scheduler would take care of that.

Page 2 of 3 FirstFirst 123 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. hardware clustering
    By firefox in forum PC Hardware and Components
    Replies: 23
    Last Post: 05-09-2004, 05:23 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •