GBA的CPU架构VS模拟器的主频要求！小手之迅捷大臂所不及

orientzhu · 发表于 2005-3-5 22:08:37

小生对上面ｂｏｙｆｒｉｅｎｄ大侠的深厚功力佩服地五体投地！！
虽然小生不懂汇编语言，不过我还是努力看了好几遍，尽量理解一点是一点。
从上面对CPU指令操作原理的分析来看，似乎可以得出这样一个结论：就是RISC的指令结构完全比CISC的指令结构灵活高效，首先说到的指令替代，CISC替代方法比较严格死板，而RISC替代方法自由灵活。不过在一条指令替代十二条指令的情况下会不会发生“张冠李戴，次序混乱”？而后谈到的PIPE-LINE，好像可以比作为成员与组织的关系。就是说，组织越大，每个成员的编号也越长，因为RISC没有CISC那么大的“组织规模”，所以不需要太长的PIPE-LINE，因而RISC执行操作的速度无论如何也比CISC快！是不是这样呢？
不过！有了RISC技术，为什么还要用CISC技术？Intel用RISC技术造CPU不是更快了吗？难道RISC技术不是Intel的？是日本人的精明新发明？

BoyFriend · 发表于 2005-3-6 07:17:28

上面提到的一条指令代替多条指令是不会发生张冠李戴的情况的，实际上，这只是 object-oriented programming 中的 polymorphism 理论的应用，简单的讲就是名字虽然一样，但是输入的x，y的值不同，就做不同的事情。

另外就是Pipe-Line是在设计CPU的时候决定的，它和RISC，CISC没有直接的因果关系的。我前面举的例子是为了说明ARM的CPU比的设计要先进得多。而且据说现在的新ARM CPU里面增加了提前判断的功能，也就是说在获取的过程中会提前进行判断的，如果遇到了上面例子中提到的那个情况，他会提前跳转到n+1行去获取指令的，这样就节约了不少的时间了。

最后就是，现在RISC的应用范围要比CISC的应用范围广泛得多。绝大多数的手机用的RISC，而且绝大多数都是ARM的CPU，另外其他的设备也是如此，比如：router, switch, hub等等，大多数用的都是ARM。ARM可以说是世界上使用范围最广最多的CPU了。还有IBM的Power CPU的应用也很广泛，比如苹果电脑都是用的Power CPU，任天堂的Game Cube用的也是Power CPU。现在IBM还在开发，新一代的RISC CPU ---- Cell. 他将被用在PS3里面。还有SUN的SPARC也是RISC。Intel的Xscale是ARM和Intel一起开发的StrongARM的扩展版本，现在被广泛的应用于使用Windows CE系列的PDA中。

而且，现在的AMD的CPU实际上是用的RISC的核心，不过在核心的外面有一个翻译层，将其指令翻译为CISC指令，因为为了让大家能够同时使用原来的CISC的操作系统和应用程序才这样做的。

补充一句，RISC不是由小日本发明的，早期的RISC是Jim Thornton 和 Seymour Cray设计的。具体的细节你要读一读相关的文章了。

Early RISC

The first system that would today be known as RISC wasn't at the time; it was the CDC 6600 supercomputer, designed in 1964 by Jim Thornton and Seymour Cray. Thornton and Cray designed it as a number-crunching CPU (with 74 op-codes, compared with a 8086's 400) plus 12 simple computers called 'peripheral processors' to handle I/O (most of the operating system was in one of these). The CDC 6600 had a load/store architecture with only two addressing modes. There were eleven pipelined functional units for arithmetic and logic, plus five load units and two store units (the memory had multiple banks so all load/store units could operate at the same time). The basic clock cycle/instruction issue rate was 10 times faster than the memory access time.

Another early load/store machine was the Data General Nova minicomputer, designed in 1968.

The most public RISC designs, however, were the results of university research programs run with funding from the DARPA VLSI Program. The VLSI Program, practically unknown today, led to a huge number of advances in chip design, fabrication, and even computer graphics.

UC Berkeley's RISC project started in 1980 under the direction of David Patterson, based on gaining performance through the use of pipelining and an aggressive use of registers known as register windows. In a normal CPU you have a small number of registers, and a program can use any register at any time. In a CPU with register windows, there are a huge number of registers, 128, but programs can only use a small number of them, 8, at any one time. A program that limits itself to 8 registers per procedure can make very fast procedure calls: The call, and the return, simply move the window to the set of 8 registers used by that procedure. (On a normal CPU, most calls "flush" the contents of the registers to RAM to clear enough working space for the subroutine, and the return "restores" those values).

The RISC project delivered the RISC-I processor in 1982. Consisting of only 44,420 transistors (compared with averages of about 100,000 in newer CISC designs of the era) RISC-I had only 32 instructions, and yet completely outperformed any other single-chip design. They followed this up with the 40,760 transistor, 39 instruction RISC-II in 1983, which ran over three times as fast as RISC-I.

At about the same time, John Hennessy started a similar project called MIPS at Stanford University in 1981. MIPS focussed almost entirely on the pipeline, making sure it could be run as "full" as possible. Although pipelining was already in use in other designs, several features of the MIPS chip made its pipeline far faster. The most important, and perhaps annoying, of these features was the demand that all instructions be able to complete in one cycle. This demand allowed the pipeline to be run at much higher speeds (there was no need for induced delays) and is responsible for much of the processor's speed. However, it also had the negative side effect of eliminating many potentially useful instructions, like a multiply or a divide.

The earliest attempt to make a chip-based RISC CPU was a project at IBM which started in 1975, predating both of the projects above. Named after the building where the project ran, the work led to the IBM 801 CPU family which was used widely inside IBM hardware. The 801 was eventually produced in a single-chip form as the ROMP in 1981, which stood for Research (Office Products Division) Mini Processor. As the name implies, this CPU was designed for "mini" tasks, and when IBM released the IBM RT-PC based on the design in 1986, the performance was not acceptable. Nevertheless the 801 inspired several research projects, including new ones at IBM that would eventually lead to their POWER system.

In the early years, the RISC efforts were well known, but largely confined to the university labs that had created them. The Berkeley effort became so well known that it eventually became the name for the entire concept. Many in the computer industry criticized that the performance benefits were unlikely to translate into real-world settings due to the decreased memory efficiency of multiple instructions, and that that was the reason no one was using them. But starting in 1986, all of the RISC research projects started delivering products. In fact, almost all modern RISC processors are direct copies of the RISC-II design.

Modern RISC

Berkeley's research was not directly commercialized, but the RISC-II design was used by Sun Microsystems to develop the SPARC, by Pyramid Technology to develop their line of mid-range multi-processor machines, and by almost every other company a few years later. It was Sun's use of a RISC chip in their new machines that demonstrated that RISC's benefits were real, and their machines quickly outpaced the competition and essentially took over the entire workstation market.

John Hennessy left Stanford to commercialize the MIPS design, starting the company known as MIPS Computer Systems Their first design was a second-generation MIPS chip known as the R2000. MIPS designs went on to become one of the most used RISC chips when they were included in the PlayStation and Nintendo 64 game consoles. Today they are one of the most common embedded processors in use for high-end applications.

IBM learned from the RT-PC failure and would go on to design the RS/6000 based on their new POWER architecture. They then moved their existing S/370 mainframes to POWER chips, and found much to their surprise that even the very complex instruction set (dating to the S/360 from 1964) ran considerably faster. The result was the new System/390 series which continues to be sold today as the zSeries. POWER would also find itself moving "down" in scale to produce the PowerPC design, which eliminated many of the "IBM only" instructions and created a single-chip implementation. Today the PowerPC is used in all Apple Macintosh machines, as well as being one of the most commonly used CPUs for automotive applications (some cars have over 10 of them inside).

Almost all other vendors quickly joined. From the UK similar research efforts resulted in the INMOS Transputer, the Acorn Archimedes and the Advanced RISC Machine line, which is a huge success today. Companies with existing CISC designs also quickly joined the revolution. Intel released the i860 and i960 by the late 1980s, although they were not very successful. Motorola built a new design called the 88000 in homage to their famed CISC 68000, but it saw almost no use and they eventually abandoned it and joined IBM to produce the PowerPC. AMD released their 29000 which would go on to become the most popular RISC design in the early 1990s.

Today RISC CPUs (and microcontrollers) represent the vast majority of all CPUs in use. The RISC design technique offers power in even small sizes, and thus has come to completely dominate the market for low-power "embedded" CPUs. Embedded CPUs are by far the most common market for processors: consider that a family with one or two PCs may own several dozen devices with embedded processors. RISC had also completely taken over the market for larger workstations for much of the 90s. After the release of the Sun SPARCstation the other vendors rushed to compete with RISC based solutions of their own. Even the mainframe world is now completely RISC based.

However, despite many successes, RISC has made few inroads into the desktop PC and commodity server markets, where Intel's x86 platform remains the dominant processor architecture (Intel is facing increased competition from AMD, but even AMD's processors implement the x86 platform, or a 64-bit superset known as x86-64). There are three main reasons for this. One, the x86 had a very large base of proprietary applications, whereas no RISC platform could claim the same, and this allowed x86 chip-makers to enjoy continuous sales despite a lack of performance. The second is that, although RISC was indeed able to scale up in performance quite quickly and cheaply, Intel countered by spending enormous amounts of money on processor development. For example, if it costs ten times as much to design a x86 chip with twice the performance of a competing RISC CPU, then no matter, Intel has ten times the cash and proceeds to do it. In reality Intel has even more than that, and Intel's CPUs continue to make great (and to many, surprising) strides in performance and more recently so have AMD's CPUs. The third reason is that Intel designers realized that RISC is a set of design philosophies and practices instead of an architecture. Intel started to apply many of the RISC principles to their CISC microprocessors in the 1990s. For example, the PentiumPro processor has special functional units which crack the majority of the CISC instructions into simpler RISC operations. Internally, the PentiumPro and descendant processors are RISC machines that emulate a CISC architecture.

The development cost considerables are ignored by the consumers, where the only considerations are outright speed and compatibility with older machines. This has led to an interesting chain of events. As the complexity of developing more and more advanced CPUs increases, the cost of both development and fabrication of high-end CPUs has exploded. In effect, whatever cost gains RISC gave to the CPU designer has been lost, and today only the biggest chip makers are capable of making high performing CPUs. The end result is that virtually all RISC platforms with the exception of IBM's POWER/PowerPC have greatly shrunk in scale of development of high performing CPUs (like SPARC and MIPS) or even abandoned (like Alpha and PA-RISC) during the 00s. As of 2004, x86 chips are the faster CPUs in SPECint displacing all RISC CPUs, and the fastest CPU in SPECfp is the IBM Power 5 processor.

Still, RISC designs have led to a number of successful platforms and architectures, some of the larger ones being:

* MIPS's MIPS line, found in most SGI computers and the PlayStation and Nintendo 64 game consoles
* IBM's POWER series, used in all of their minis and mainframes
* Motorola and IBM's PowerPC (a version of POWER) used in all Apple Macintosh computers
* Sun's SPARC and UltraSPARC, found in all of their later machines
* Hewlett-Packard's PA-RISC HP/PA
* DEC Alpha
* ARM — Palm, Inc. originally used the (CISC) Motorola 680x0 processors in their early PDAs, but now uses (RISC) ARM processors in their latest PDAs; Nintendo uses an ARM CPU in the Game Boy Advance and Nintendo DS handheld game systems

BoyFriend · 发表于 2005-3-6 07:21:37

RISC 还有一个好处就是，耗电量少，散热少。

llc · 发表于 2005-3-7 10:22:12

除了CPU的模拟外，还有一个GPU的模拟，家用游戏机搭配一个相对低主频的CPU的就能产生很多游戏效果，很大程度上就是靠GPU这个图形处理硬件
譬如在FC游戏机里，画面和角色的绘制是按一个个方块来处理的，每次填充或消隐是按方块为单位而且是硬件支持的，你只要输入新的方块位置硬件就自动在显示器上绘出移动效果不需CPU过多其他干预，而你在PC机上模拟这个GPU的方块处理过程，需要PC的CPU按方块组成的点阵在内存里将一个一个的组成点描绘好，然后才能将方块显示出来；这种模拟过程是纯软件的，极度占用CPU资源，以致需要很高主频的CPU才能模拟。

lanche · 发表于 2005-3-7 17:09:16

游戏机的视频硬件是专为游戏效果而设计的，就算是很老的任天堂8位机，它的GPU都能同时分别对两个背景层、三个卡通层进行独立操作。要用软件来模拟它们，这对PC机的CPU要求当然比较高了。

orientzhu · 发表于 2005-3-8 10:47:25

谢谢ＢｏｙＦｒｉｅｎｄ老师讲解，您精通的东西真是太多了，好厉害！！好羡慕！！！
我想起了我初中里的厉害同学，好像您就是和他差不多的类型吧，天生的天才！
我会好好保存ＢｏｙＦｒｉｅｎｄ大侠的贴！很少有大高手给我回这么重要的贴的！！：0

orientzhu · 发表于 2005-3-8 11:02:32

还有ｌｌｃ和ｌａｎｃｈｅ大哥，你们的回帖又让我了解了更多，非常谢谢！！

qianzheng82 · 发表于 2005-3-8 16:43:15

不一应非要用硬件的模拟吧，也可以对 api 进行模拟呀。就像 wine 那样。

sejishikong · 发表于 2005-3-8 16:59:25

[quote:27be79b216="qianzheng82"]不一应非要用硬件的模拟吧，也可以对 api 进行模拟呀。就像 wine 那样。[/quote]
当然也有，要求的配置相对较低，但兼容性较差，比如UltraHle，后期出的模拟器大多数都是这样的．

orientzhu · 发表于 2005-3-10 08:47:41

[quote:507783d111="sejishikong"][quote:507783d111="qianzheng82"]不一应非要用硬件的模拟吧，也可以对 api 进行模拟呀。就像 wine 那样。[/quote]
当然也有，要求的配置相对较低，但兼容性较差，比如UltraHle，后期出的模拟器大多数都是这样的．[/quote]
看到这个！以为有希望了，真的有更省资源的模拟器了！用api模拟，就是我想要的啊！以为sejishikong老哥早知道，可是藏到现在才说，兴奋地到网上查了下，很不幸，越查越失望，好像没有gba的，是N64什么的，也没有看到有LINUX的，唉，还是不玩了。反正现在人大了。

sejishikong · 发表于 2005-3-10 11:51:29

[quote:44f71cc048="orientzhu"][quote:44f71cc048="sejishikong"][quote:44f71cc048="qianzheng82"]不一应非要用硬件的模拟吧，也可以对 api 进行模拟呀。就像 wine 那样。[/quote]
当然也有，要求的配置相对较低，但兼容性较差，比如UltraHle，后期出的模拟器大多数都是这样的．[/quote]
看到这个！以为有希望了，真的有更省资源的模拟器了！用api模拟，就是我想要的啊！以为sejishikong老哥早知道，可是藏到现在才说，兴奋地到网上查了下，很不幸，越查越失望，好像没有gba的，是N64什么的，也没有看到有LINUX的，唉，还是不玩了。反正现在人大了。[/quote]
这个，GBA的模拟用不着HLE吧。以现在的电脑，模拟GBA还是比较轻松，N64是因为要求比较高，才用HLE的，要是完全模拟N64的硬件架构，以现在的电脑行不行还不一定的呢。
这是N64的硬件：CPU: R-4300 64BIT CPU (93.75MHz) (运算速度: 112MIPS) 内存: 共4MB 显示: 每秒10万多边形以上,同显209万色. 音源: ADPCM 64路

直接模拟很难的。

easycat · 发表于 2005-3-10 15:40:11

。。。你让GBA运行一个quake3看看，累吐血他

Frozenlips · 发表于 2005-3-15 23:35:04

这里的linuxer好像太“专”点...我原以为linuxer都会对硬件以及CPU架构有一定认识...

找GBA架构以及ROM的传输速度不小心路过的，忍不住说2句。
对于CPU来说，不是看频率来决定工作的。萝卜坑是插萝卜的，不是说南瓜，西瓜都能塞进去。
悲哀于大家都被INTEL所欺骗，蒙蔽了。真正衡量一个CPU性能的指标不是频率，而简单的是MIPS，复杂点的是执行效率（具体的原理和计算CPU性能的方法请参考《xxxxx》，不好意思，名字太长，忘了，书也很厚）。
执行效率这东西....不是一个可以简单衡量的东西，往往与CPU的架构，指令系统，任务类型，算法等等有关。例如：
（PS：另外提一句，CISC与RISC之间并没有优劣之分，各有各优点与胜任领域，而且，现在很多CPU都是CISC与RISC混用，各取所长的。主要看各个CPU本身的特点）
要运行一条mov指令....er，这个算了，上面有人说了。打个比喻吧，
一个人A手很巧，可以每次准确分出100克的东西，但他力气很小，抬不起20公斤以上的东西；
另一个人B手可能比较笨，一抓就是半斤1斤的，但他力气很大，可以一次同时抬起几个20公斤的东西；
如果你非要让B去做A的工作，那就很费劲，很没有效率，可能要给他准备很多度量的工具，还要花很多时间很多步骤才能完成。
模拟就是这个道理。
实际点讲，GBA的CPU可能有条专门指令是把A，B，C，D几个数专门做个特定多项式的运算，就是只用1个时钟周期就完成了一条专门的运算，但你非要让另外一种结构另一套指令系统的CPU去做，那他可能就要花10个时钟，20个，甚至上100个才能完成。就算你快10倍的CPU也未必可以做得想GBA那样好。
这便是CPU架构不同，应用领域不同带来的差别。不是靠什么改变接口之类可以改变的。
要想花相同或更小的CPU力气来做，只有2种办法：
1，将软件--即你们谈的游戏，重新编写，按照另外一种CPU的架构来写，其实这就是移植。
2，改变你的CPU架构....呵呵...最后就变成GBA的CPU了。

Frozenlips · 发表于 2005-3-15 23:38:16

另外，有人知道GBA，NDS，PSP等的读取游戏ROM时的数据速度么？

BoyFriend · 发表于 2005-3-18 09:46:15

你去下面的网站查一查吧。
http://www.gbadev.org/index.php
http://www.work.de/nocash/gbatek.htm

		自动登录	找回密码
密码			注册