Thursday, August 30, 2007

Seagate to offer solid-state drives in 2008

Because solid state drives can only be written and read from a finite number of times they have a very limited life expectancy. For example if you ran a benchmark test against the disk that updates it say 100,000 times the disk will be toast. These disks really need a disk manager that shows what percentage of it's life has been used and how long it will last based on the useage similar to what it has experienced in the last day, week, month, or some user specified period of time.

As it stands currently these disks are really only good for data that doesn't change and only needs to be read once in to ram each time the computer is powered on. Anything more and you are asking for trouble. Probably not many people realize these hard disks are about as worthless as the batteries built into many products. That is they are engineered to break down.

What's the solution? This product screams for a new market niche. This new market niche will consist of third party products that configure ram disks and management software to avoid using the SS Drives more than absolutely necessary. Here's how I see them working:
1) You need a motherboard that can hold 10+ GBs of RAM
2) You need a driver that partitions some amount of RAM into a RAM disk.
3) Calls to the SS drive are intercepted by the driver and read into the RAM disk and then the RAM disk feeds the program that called for the data.
4) A separate database resides on the ram disk specifying what files have been written to and when, the frequency of the writes, basically disk useage stats. All of these stats are read into the ram drive at boot up.
5) Programs doing virus scans and such, that do not change the files, should be blocked by the disk management software from actually reading the SS drive. Imagine a firewall doing this.
6) Writes back to the SS drive should be done intelligently. Say on power down, on important changes, at specific times.
7) Ideally the ram could be part of a RAID controller card that has battery back up.
8) Frequently used data could be prefetched when the cpu are not in use.
9) You'd want a web interface so it would be as portable as a linksys firewall. And like linksys you'd want the upgrades to be managed over the web with just a click of a button.


This type of software would dramatically lengthen the SS disks life. Not to mention making all hard disks much much faster. The key is offloading all intelligence to a PCI-E card that can hold GBs of cache. The cheapest RAM could be used because it would be obviously much faster than any HD. You'd want a linux OS to control it. The hard part would be to getting the RAID working on it. Maybe a RAID 1 solution would be a great start.

4 comments:

. said...

On further thought I think what's needed is a sata cache box that sits between the HD and whatever.

This box would hopefully be able to hold many gigabytes. Maybe 16 to 32 dimm slots. Holding 16 to 128GBs. The card would also hold an index of everything on each of the hard disks attached to it. There would be a LINUX OS with a memory stick holding the index. This is so it can function over long periods of no power. On a power outage the memory stick would be also used to write out the data not yet stored on the disks before getting powered down totally.

The caching mechanism would concentrate on preloading files that are expected to be used often.
1) Files pulled up one after another at a certain time each day will be recognized as being used by a virus program and kept out of cache.
2) Newly written files will be stored in cache for awhile. If they don't get accessed often then good riddance. Perhaps an aging process.
3) All disks attached will be stored in one big cache.

Maybe an index of all sectors etc would be what's indexed. Since files would be stored across multiple disks since it's not files. So something like an index of disk/sector/whatever. Then some clever mechanism to cache based on location and access patterns, times, and whatever the people who think about this sort of thing can come up with. Maybe just maybe all files except for disk scans can be cached so they are always fast.

Maybe to begin with it would be easiest to simply age out the oldest requested. Perhaps also an ethernet connection so that it could be monitored. I'd be interested in knowing my hit rate. Then more ram could be added.

Perhaps each of these could handle up to 8 disks. So you would have 8 incoming sata cables and 8 outgoing sata cables plus one ethernet connection. It may be limited by how fast the CPU and LINUX OS is in each cache-disk. It needs to pump out 300MB/s to the computer and it might be receiving that much per second from the disk. If the writing can take place at the same time then you got another 600MB/s to process. So each HD may require 1.2TB/s of processing. Eight disks would then require a CPU with a linux os capable of pumping through it 9.6TB/s while handling the ethernet interface and the cache. Then there is error handling. This could be a fun project.

. said...

I'm guessing the CPU would not be the limiting factor since RAID cards can handle 24 SATA disks at full speed.

Then for max speed you'd always want a RAID card talking to multiple disks so you push the wires to their limits.

I'm not sure how best to handle writes. Seems like some logic that groups the writes so the head can move quickest might be a great idea. So you'd do the queuing on the cache disk. I think I'd just keep it simple to start with.

A TPF type distribution system of memory chunks might be best. Imagine the speed! This whole system would be list driven with an execution length in the hundreds.

Sticking something like this in front of a SS disk might be the perfect way to extend the life.

. said...

Options to specify when and how to flush the cache drive would be great. Say perhaps on power down or power up. When the dirty cache is too large to fit in the memory stick. Or perhaps on a time basis, like every hour or day.

I'm thinking how great it would be having a RAID 60 system on top of 12 disks. You'd be pumping the PCI-E channel with user data without having to wait for the slowest disk. So good throughput with no delays.Speeds of 10x300MB/s. Or 3TB/s. The IO subsystem may no longer be the bottleneck in the future.

. said...

Imagine a disk that actually is just a SATA to USB converter that has 1 or more memory sticks in the back of it.

Then with one of these cache disks fronting 8 of them you could with a raid 6 setup get 6*300MB/s or 1.8TB/s.

Now if you had a contraption to make each USB look like it's own disk, and stuck in 24, you could do magic.