OSDev.org

Posted: **Wed Jul 13, 2016 7:40 am**

physecfed wrote:All that is required in order to potentially hit 13nm or lower.

But what about the size of atom? At 14nm there's just 100 silicon atoms available. And it's without the space between atoms. For doping materials (or what the actual term is used in the industry?) to make some effect it's required to have levels like 1 dope atom per 100 000 silicon atoms. In the cube with 100 atoms side we have just 1000 000 silicon atoms. If space between the atoms is considered then there even less atoms. So, we have to have less than 10 atoms per mentioned cube. It means the statistical approach shouldn't work any more and there should be a lot of troubles with calculations about the material characteristics. And if we need 1 dope atom per 1000 000 silicon atoms? What about such problem?

Posted: **Wed Jul 13, 2016 8:22 am**

Hi,

embryo2 wrote:
physecfed wrote:All that is required in order to potentially hit 13nm or lower.
But what about the size of atom? At 14nm there's just 100 silicon atoms available. And it's without the space between atoms. For doping materials (or what the actual term is used in the industry?) to make some effect it's required to have levels like 1 dope atom per 100 000 silicon atoms. In the cube with 100 atoms side we have just 1000 000 silicon atoms. If space between the atoms is considered then there even less atoms. So, we have to have less than 10 atoms per mentioned cube. It means the statistical approach shouldn't work any more and there should be a lot of troubles with calculations about the material characteristics. And if we need 1 dope atom per 1000 000 silicon atoms? What about such problem?

The "14 nm" is how small they can make lines, etc; not the size of a transistor. For example, you might have 2 transistors that are 300 nm wide that are separated by a 14 nm gap.

Also note that Moore's Law is "number of transistors doubles every 2 years". This can be achieved by making die size larger without changing the size of transistors. For example, a 22-core Xeon Broadwell-E5 has about 7.2 billion transistors on a 456 square mm area; so in 2 years we can have 14.4 billion transistors on a 912 square mm area, in 4 years we can have 28.8 billion transistors on a 1824 square mm area, and so on.

Cheers,

Brendan

Posted: **Thu Jul 14, 2016 11:47 am**

Brendan wrote:The "14 nm" is how small they can make lines, etc; not the size of a transistor. For example, you might have 2 transistors that are 300 nm wide that are separated by a 14 nm gap.

I've read about memory cell with the area 0.1 square micro-meters. But it's still much more than 14 nm. So, you are right. But communication is also a part of calculation, so, the size of lines matters and it's not completely just a marketing trick. Also with reduced line width there's more space for transistors.

Brendan wrote:Also note that Moore's Law is "number of transistors doubles every 2 years". This can be achieved by making die size larger without changing the size of transistors. For example, a 22-core Xeon Broadwell-E5 has about 7.2 billion transistors on a 456 square mm area; so in 2 years we can have 14.4 billion transistors on a 912 square mm area, in 4 years we can have 28.8 billion transistors on a 1824 square mm area, and so on.

It's easier to make 4*22-core processors than to create a monster with 1824 square mm area. Because the value of multicore yields diminishing returns.

It's interesting what will be the final size in the microprocessor industry?

Posted: **Fri Jul 15, 2016 4:30 am**

embryo2 wrote:
physecfed wrote:All that is required in order to potentially hit 13nm or lower.
But what about the size of atom? At 14nm there's just 100 silicon atoms available. And it's without the space between atoms. For doping materials (or what the actual term is used in the industry?) to make some effect it's required to have levels like 1 dope atom per 100 000 silicon atoms. In the cube with 100 atoms side we have just 1000 000 silicon atoms. If space between the atoms is considered then there even less atoms. So, we have to have less than 10 atoms per mentioned cube. It means the statistical approach shouldn't work any more and there should be a lot of troubles with calculations about the material characteristics. And if we need 1 dope atom per 1000 000 silicon atoms? What about such problem?

As to the terminology, typically dopant is preferred - you're not far off. And the actual dopant concentrations are much lower than 1/100,000 - they're on the order of parts per million or even parts per billion, and different P/N dopants are used in different processes for various reasons. Examples of N-dopants are phosphorous and arsenic, P-dopants typically boron, gallium or such.

As Brendan noted, #nm is the linewidth, or the width required to make a track on a wafer. The tracks are metal, such as aluminum, so they aren't required to be doped (and in fact, they shouldn't be - it would interfere with electrical characteristics). The transistors are allowed to be much larger, but the core reason they push for lower linewidth is it allows interconnection areas such as buses to take up lesser space, leaving more room for transistors.

The transistor size can and does scale downwards as well but not at the same rate, and the big benefit of reduction of node size is a) compression of the switching regions and b) minimization of interconnect area. This yields other things as well, such as better performance at higher speeds (things start to get funny when the length of the line approaches a significant fraction of the wavelength of the signal it's transmitting).

You are correct, though, in saying that there will be a point at which this cannot continue further. No matter what tricks we come up with, there is no escaping the fact that we cannot produce a size lower than an atom. Things are going to get interesting when we start getting into the single-digit node range, because then we are approaching the area where the material characteristics begin to fundamentally change - we're already at the size where quantum interactions are a major consideration factor, as opposed to the SPICE-y EM-field model of these devices.

I suppose this is a major reason why there's a lot of semi research going into things such as 3D IC production - we're hitting the point where planar processes are reaching their limit, but building upwards allows some continuance of scale increase. 3D ICs are stacking thinned wafers via interposer sections. They can potentially address the issues with planar constructions because they allow for sections of a die to be compartmentalized and stacked.

For example, a SoC might be composed of a bunch of different fundamental areas - I/O drivers and the peripherals, NV storage (flash), RAM, and the processor cores themselves. In planar methods, you will eventually hit a issue where:

You either hit a limit on complexity or feature set, because of routing and interconnect congestion,
or you punch the node size/linewidth smaller to enable those greater features/complexities.

Obviously, both ways make fab/foundry engineers sweat, because they will both hit the limit eventually (and sooner than they hope!). With 3D IC constructions, you can manufacture the core and I/O as one subunit, the flash as another subunit, and the RAM as another. These units are then vertically-linked through "through-silicon vias". This allows for the subunits to increase in density/complexity, while still retaining the same square area footprint.

All in all though, it's certainly interesting to watch the industry and the solutions evolve as rapidly as they do. Makes it impossible for any one person to really keep up with.

Posted: **Sat Jul 16, 2016 1:11 pm**

@physecfed

Thanks for clarification!

physecfed wrote:The transistor size can and does scale downwards as well but not at the same rate, and the big benefit of reduction of node size is a) compression of the switching regions and b) minimization of interconnect area. This yields other things as well, such as better performance at higher speeds

About performance - smaller switches lead to lesser current and lesser power requirements and in turn it leads to the lesser heat, while with the same heat it gives the ability to increase number of current switches per second. But from some point processor's frequency had stopped to grow. What is the reason for it? There's still too much heat despite the lesser switch current. But why?

physecfed wrote:Things are going to get interesting when we start getting into the single-digit node range

Is it only theoretical issue (lack of proper mathematical model to calculate some technology parameters)?

physecfed wrote:For example, a SoC might be composed of a bunch of different fundamental areas - I/O drivers and the peripherals, NV storage (flash), RAM, and the processor cores themselves.

What is the advantage of such layout in terms of performance or cost or power consumption? Isn't it similar to a number of separate devices or it could deliver some essential advantage in speed, power, cost?

Posted: **Sat Jul 16, 2016 9:43 pm**

embryo2 wrote:About performance - smaller switches lead to lesser current and lesser power requirements and in turn it leads to the lesser heat, while with the same heat it gives the ability to increase number of current switches per second. But from some point processor's frequency had stopped to grow. What is the reason for it? There's still too much heat despite the lesser switch current. But why?

As process node sizes decrease, sure the smaller switches become more power-efficient, but this improvement is counterweighted by the fact that the circuit is now physically more dense and now heat removal becomes trickier.

There are a lot of conceptual issues with running up the clock rates that don't relate directly to the transistors themselves. For example, as the clock speed increases, digital circuits start to behave much more like analog systems, where it becomes much more difficult to control things like impedance, reflections, etc. EMI/RFI begins to become an issue, because at higher speeds now the trace length becomes a bigger fraction of the effective wavelength of the signal that trace is transmitting. Things like that require specialized dielectric controls to maintain the proper RF qualities in the silicon.

At any rate, many systems external to the CPU fail to scale in exact proportion to CPU speeds, such as I/O components and RAM. There's a reason we're in the DDR4 era now and core RAM clock frequencies have not exceeded 1.6 GHz or so. Granted, we can muss with that a little with the dual-timing of double data rate. But generally, running up the core clock has the potential to generate or exacerbate instabilities between the CPU and the rest of the system.

Probably the biggest factor in it is oddly enough, Intel. Processor cores (and Intel cores especially) were on the rise across the industry generally up to the Pentium 4. The NetBurst microarchitecture was designed to scale to above 8 GHz, but it didn't anticipate the power concerns of running generally above 3 GHz, and above that speed (>= 4 GHz or so) the thermal dissipation shot through the roof. Those cores are the ones favored by overclockers because they hit 8.1 GHz, but it takes liquid nitrogen to get them there. After NetBurst, Intel figured that it was impractical to produce desktop processors that went that fast because they'd command exotic cooling systems simply to function, and therefore high-3s was the highest they could push the speed, and so most devices' PLLs are designed to cap their multipliers around that range.

The subsequent effect was that Intel pursued increasing performance by other means; namely, multicore processors, multithreading, larger caches, deeper and more superscalar pipelines, higher bus speeds. Intel pushes fab technology along quite well - they were one of the big spurs behind the development of EUV lithography - and so generally if they do something the industry at large generally likes to read the writing on the wall. Hence, no CPUs coming out higher than 3.5 GHz or so. At any rate, once the systems were developed for handling multicore processing well, everyone probably just figured they'd keep doing that (jamming on more cores, optimizing the core communication schemes) rather than trying to force single cores into the millimeter wave range.

Recently Intel has started to show signs of questioning that philosophy, as recent Xeons have scaled (via Turbo Boost) to 4 GHz. So, maybe in the grand scheme of things, the "speed versus parallel" argument might begin to lean in favor of speed again.

That being said, there's nothing that says that transistors have not continued to get faster. IBM several (~10) years back announced graphene-based transistors with a practical switch cutoff of 500 GHz. Nowadays, the development of "exotic" transistors such as MODFETs are routinely being shown to have power gain capacity into the 1 THz range. Why we aren't building microprocessors with them is a separate discussion, however.

embryo2 wrote:Is it only theoretical issue (lack of proper mathematical model to calculate some technology parameters)?

I wouldn't say so. If we're able to hit 13nm we're able to go lower. IBM announced right about a year ago that they managed to demonstrate functional transistors on a 7nm SiGe process. The first commercial production at 7nm is likely to start within the next five years or so.

I'm just meaning that as we approach the really small scales, suddenly quantum effects are going to become more and more of an issue. The notion that transistors have properties that can be explained by formulas, such as transconductance, intrinsic charge, etc., is increasingly having to yield to statistical mechanics and quantum mechanics approaches because we're approaching the ranges where those things suddenly become relevant. It's sort of like saying if you're the size of an flea, now an ant is a much more formidable and closely-sized enemy.

embryo2 wrote:What is the advantage of such layout in terms of performance or cost or power consumption? Isn't it similar to a number of separate devices or it could deliver some essential advantage in speed, power, cost?

Are you talking about 3D ICs or IC design in general? The laying out of a circuit into distinct areas based on function and technology type is a core stage in semiconductor design known as floorplanning - it's been standard practice ever since integration became a thing. One might also consider the influence business models such as ARM have on the industry. ARM sells processor cores and peripherals to companies who then lay them out and fabricate them. By running a highly-compartmentalized footprint, different varieties of the same system-on-chip family can be created simply by changing out the amount of RAM/Flash and the set of peripherals, but the processor core itself remains constant. That's why when you go to buy embedded processors, you often find a multitude different options varying in number of communication buses, amount of program/data memory, 2/4 core options, specialty units like cryptographic coprocessors, etc. That's all because most of the stock of those variations is the same - they're punched from the same die.

As to 3D ICs, the big bonus is decrease in routing congestion, both on the die and on the circuit board, as well as a general increase in board density because of the vertical stacking. There might also be power considerations - most devices nowadays run at a voltage level of 1.0V or 1.2V as their internal logic voltage. For interfacing to the external world, they have to use driver stages to step that up to 1.8V, 2.5V, 3.3V, standard CMOS voltage levels. 3D IC constructions allow for the different stacked layers to continue to talk to each other at their internal voltage levels without requiring the power driver stages. That's a power bonus, although how significant - I can't say.

Posted: **Sun Jul 17, 2016 11:34 am**

physecfed wrote:There are a lot of conceptual issues with running up the clock rates that don't relate directly to the transistors themselves. For example, as the clock speed increases, digital circuits start to behave much more like analog systems, where it becomes much more difficult to control things like impedance, reflections, etc. EMI/RFI begins to become an issue, because at higher speeds now the trace length becomes a bigger fraction of the effective wavelength of the signal that trace is transmitting. Things like that require specialized dielectric controls to maintain the proper RF qualities in the silicon.

Well, now processors are like all those radio frequency devices that should be designed with the very different set of mathematical tools. Plus quantum effects. Yeah, new way of doing things is always too hard to get some results quickly.

physecfed wrote:Recently Intel has started to show signs of questioning that philosophy, as recent Xeons have scaled (via Turbo Boost) to 4 GHz. So, maybe in the grand scheme of things, the "speed versus parallel" argument might begin to lean in favor of speed again.

First they should learn quantum mechanics and radio appliances design. It's not an easy learning.

physecfed wrote:As to 3D ICs, the big bonus is decrease in routing congestion, both on the die and on the circuit board, as well as a general increase in board density because of the vertical stacking. There might also be power considerations - most devices nowadays run at a voltage level of 1.0V or 1.2V as their internal logic voltage. For interfacing to the external world, they have to use driver stages to step that up to 1.8V, 2.5V, 3.3V, standard CMOS voltage levels. 3D IC constructions allow for the different stacked layers to continue to talk to each other at their internal voltage levels without requiring the power driver stages. That's a power bonus, although how significant - I can't say.

Well, in fact 3D design just adds some flexibility to the subject, but without essential speed, power or cost advantage. However, even small cost advantage on a highly competitive market can do the trick for the technology owner.

Posted: **Sun Jul 17, 2016 1:18 pm**

Hi,

embryo2 wrote:Well, in fact 3D design just adds some flexibility to the subject, but without essential speed, power or cost advantage. However, even small cost advantage on a highly competitive market can do the trick for the technology owner.

I'd expect significant speed and power consumption advantages; because you can have more pieces closer and drastically reduce routing costs and related problems. Of course I'd also expect it'd make heat dissipation more challenging too.

Note that this is already being done for high bandwidth memory (in a relatively crude way, and not as a true 3D die) - mostly by creating 2D dies with "micro bumps" and then stacking dies on top of each other (where the "micro bumps" on one die make contact with the die above it).

Cheers,

Brendan

Posted: **Mon Jul 18, 2016 8:12 am**

Brendan wrote:I'd expect significant speed and power consumption advantages; because you can have more pieces closer and drastically reduce routing costs and related problems. Of course I'd also expect it'd make heat dissipation more challenging too.

What can we place at different layers? If it is a new core then it's communication priorities are near the things like cache and internal bus, while communication with another core is usually made with the help of common memory. If it is the common memory that is placed between two layers with processor's cores then it is possible to decrease the length of communication channels, but for the speed increase to take place it is required for the speed of the intermediate layer memory to be on par with the speed of cores. Is it possible? The speed should be equal to the speed of a register file which is the fastest data storage available in the processor. If the speed is lesser than required then we have delay of tenths clock cycles per request and it moves the communication speed in range of hundreds megahertz instead of gigahertz. But such speed can be achieved without 3D design. So, what else we can place on the next layer above the core?

Power consumption here can be achieved mostly by skipping the intermediate voltage increase, which was mentioned by physecfed. But it's not comparable with the consumption of billions of transistors, because bus supporting units usually have not very much transistors - 1000, for example, even if they consume more power it's not a million times bigger.

Posted: **Mon Jul 18, 2016 6:03 pm**

embryo2 wrote:
Brendan wrote:I'd expect significant speed and power consumption advantages; because you can have more pieces closer and drastically reduce routing costs and related problems. Of course I'd also expect it'd make heat dissipation more challenging too.
What can we place at different layers? If it is a new core then it's communication priorities are near the things like cache and internal bus, while communication with another core is usually made with the help of common memory. If it is the common memory that is placed between two layers with processor's cores then it is possible to decrease the length of communication channels, but for the speed increase to take place it is required for the speed of the intermediate layer memory to be on par with the speed of cores. Is it possible? The speed should be equal to the speed of a register file which is the fastest data storage available in the processor. If the speed is lesser than required then we have delay of tenths clock cycles per request and it moves the communication speed in range of hundreds megahertz instead of gigahertz. But such speed can be achieved without 3D design. So, what else we can place on the next layer above the core?

Power consumption here can be achieved mostly by skipping the intermediate voltage increase, which was mentioned by physecfed. But it's not comparable with the consumption of billions of transistors, because bus supporting units usually have not very much transistors - 1000, for example, even if they consume more power it's not a million times bigger.

In many circumstances when a design is segregated into the individual component layers for 3D IC design, only the very large, macroscopic items are considered. It's the same principle as a large city, say New York. You hit the limit of building outwards in a planar methodology, so you start building vertically.

A skyscraper like the Empire State Building or World Trade Center complex contains a number of different companies that occupy the space and those companies are probably in diverse industries. It wouldn't make sense to jam half of Company A on one floor with a third of Company B and a quarter of Company C and that one dude down the hall from Company D. Instead, they all get their own floor so they only have to interact (or care about the others' presence) when they have to.

For very high-bandwidth memory, such as Hybrid Memory Cube, the memory would be split into four banks. For FPGAs, such as the Virtex-7 XC7V2000T, the FPGA fabric is simply split into four. That way, each component runs at the same speed, because each component is identical, aside from maybe one layer also containing the master I/O interfaces.

For processors (namely system-on-chips), typically the processor core, caches, TLBs, MMUs and other traditional processor features along with the high-speed I/O units reside on one layer, the RAM on another and the flash/NVRAM on another. The flash and RAM will usually have the necessary interfacing logic present with it on the same layer, such as address decoding, buffering, timing circuitry and wear-leveling controllers (if flash). If one component in the stack has to work at timings or speeds that are fundamentally incompatible with the other layers, it can be placed lower in the stackup with dedicated signal pathways and simply feed-through or bypass the other signals.

I'd tend to say that's less of an issue, though, because most CPU buses (front-side and memory buses) are already designed to function at frequencies lower than core frequency, frequencies more suited for I/O and peripherals. Not much functions at register-file speed - not even the ALUs. While 3D ICs do have the advantage of speed, being closer together and more highly-integrated, I'd say a lot of the speed advantages aren't being utilized; the process needs to become more widespread before we'll start to see firms running up the clock on potential designs.

Posted: **Tue Jul 19, 2016 10:16 am**

physecfed wrote:For very high-bandwidth memory, such as Hybrid Memory Cube, the memory would be split into four banks. For FPGAs, such as the Virtex-7 XC7V2000T, the FPGA fabric is simply split into four. That way, each component runs at the same speed, because each component is identical, aside from maybe one layer also containing the master I/O interfaces.

So, the speed improvements can be available in case of a some special hardware.

physecfed wrote:For processors (namely system-on-chips), typically the processor core, caches, TLBs, MMUs and other traditional processor features along with the high-speed I/O units reside on one layer, the RAM on another and the flash/NVRAM on another.

But in case of processors we can improve only some communications with other units. Or can we split the processor in 3D layers? In case of communications the terra-bytes per second are already available, are we really need more speed here? The only case I see is the cache memory interaction in a pattern similar to the register usage. Such case requires both ways interaction with relatively limited amount of data, but at a very high speed. Also the processor's internal components require such quick cooperation, so they are the candidates for the layered design. But again - is there a task that can benefit from such design? Instead of using reliable 1 layer technology we can spend a lot of money on invention of many layers and finally discover that the same task was accomplished perfectly without all those spendings, just by employing a few more separate units. I see it's a tricky problem to find an area where electronic component density increase in 3-rd dimension can yield some viable benefits.

However, if we remember the goal of miniaturization then such density increase can help us to produce something like a new smart watches with the power of tenths cores, volume of tenths of giga-bytes of memory and storage of hundreds of giga-bytes of NVRAM. But another problem - what can we do with such a power on our hand?

physecfed wrote:the process needs to become more widespread before we'll start to see firms running up the clock on potential designs.

Yes, the time is required to first identify the proper areas where 3D design can hit the market.

Posted: **Tue Aug 09, 2016 10:21 pm**

i am not a hw guy but i will look from revenue pov. What keeps chpmakers revenue coming in? It is faster, smaller chips that either forces and/or entices customers to upgrade for "better". So when it can not be a smaller size anymore, revenue model is in threat. Has to come out something to keep the money engine churning. That is where the concern is I believe.

I think NVidia is betting bit on this by introducing "accelerated computing" by its CUDA line of processors, latest servers are integrating more and more models and flavors of Nvidia graphics adapters, I know becuase I work on those project. So I thought nvidia has a huge opportunity in data centers, but then just today Intel bought the Nervana AI company. It seems like Intel is saying "No Nvidia, we dont necessarily need your GPUs we'll do AI ourselves." Last few years, Intel XEON processors and chipsets has become highly secure thanks to a newer securities features integrated: PTT, TPM, TXT and if I remember correctly there are instruction that does encryption and hashing in fewer clock cycles. seems like next few years, new Intel CPUs are coming out with instruction set tailored for AI calculation? Based on today's deal it looks like so.

I am wondering if this will be a blow to Nvidia's opportunity? I had a big bet on Nvidia stock and I am re-considering my position.

OSDev.org

Why are people so scared if Moore's Law is ending

Re: Why are people so scared if Moore's Law is ending

Re: Why are people so scared if Moore's Law is ending

Re: Why are people so scared if Moore's Law is ending

Re: Why are people so scared if Moore's Law is ending

Re: Why are people so scared if Moore's Law is ending

Re: Why are people so scared if Moore's Law is ending

Re: Why are people so scared if Moore's Law is ending

Re: Why are people so scared if Moore's Law is ending

Re: Why are people so scared if Moore's Law is ending

Re: Why are people so scared if Moore's Law is ending

Re: Why are people so scared if Moore's Law is ending

Re: Why are people so scared if Moore's Law is ending