Автор Тема: оптимизации -О2 и - О1  (Прочетена 2557 пъти)

edmon

  • Гост
оптимизации -О2 и - О1
« -: Jan 24, 2005, 12:09 »
попаднах на тази страница  http://www.network-theory.co.uk/docs/gccintro/gccintro_43.html
и реших да пробвам примера.
На моя Debian Testing това са резултатите.
carredas:/home/edmon# gcc -Wall -O1 test.c -lm
carredas:/home/edmon# time ./a.out
sum = 4e+38

real    0m5.644s
user    0m5.641s
sys     0m0.002s
carredas:/home/edmon# gcc -Wall -O2 test.c -lm
carredas:/home/edmon# time ./a.out
sum = 4e+38

real    0m5.726s
user    0m5.722s
sys     0m0.002s

Т.е. вижда се че с -О2 е по-лош резултата.
Пробвах това и на друг дебиан с друг процесор но  пак тестинг с подобни резултати.
На дебиан woody нещатат са нормално както и с един redhat 9.
Интересно ми ако някои има дебиан тестинг да пробва това и да публикува резултата ....
Активен

astronom

  • Напреднали
  • *****
  • Публикации: 254
    • Профил
оптимизации -О2 и - О1
« Отговор #1 -: Jan 24, 2005, 14:27 »
Примерен код

[unknown@ws-unknown test]$ uname -a
Linux ws-unknown 2.6.8-1.521 #1 Mon Aug 16 09:01:18 EDT 2004 i686 athlon i386 GNU/Linux
[unknown@ws-unknown test]$ cat /etc/fedora-release
Fedora Core release 2 (Tettnang)
[unknown@ws-unknown test]$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(TM) XP 1800+
stepping        : 1
cpu MHz         : 1916.546
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 3776.51

[unknown@ws-unknown test]$ gcc --version
gcc (GCC) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[unknown@ws-unknown test]$ cat test.c
#include <stdio.h>

double powern (double d, unsigned long n) {
  double x = 1.0;
  unsigned long j;

  for (j = 1; j <= n; j++)
    x *= d;

  return x;
}

int main (void) {
  double sum = 0.0;
  unsigned long i;

  for (i = 1; i <= 100000000; i++) {
    sum += powern (i, i % 5);
  }

  printf ("res = %g\n", sum);
  return 0;
}

[unknown@ws-unknown test]$ gcc -Wall -O0 -o test-O0 test.c
[unknown@ws-unknown test]$ gcc -Wall -O1 -o test-O1 test.c
[unknown@ws-unknown test]$ gcc -Wall -O2 -o test-O2 test.c
[unknown@ws-unknown test]$ gcc -Wall -O3 -o test-O3 test.c
[unknown@ws-unknown test]$ gcc -Wall -O3 -funroll-loops -o test-O3-unroll test.c
[unknown@ws-unknown test]$ ls -l
total 44
-rw-r--r--  1 unknown Users  334 Jan 24 14:22 test.c
-rwxr-xr-x  1 unknown Users 4953 Jan 24 14:22 test-O0
-rwxr-xr-x  1 unknown Users 4857 Jan 24 14:22 test-O1
-rwxr-xr-x  1 unknown Users 4849 Jan 24 14:22 test-O2
-rwxr-xr-x  1 unknown Users 4849 Jan 24 14:22 test-O3
-rwxr-xr-x  1 unknown Users 5109 Jan 24 14:22 test-O3-unroll
[unknown@ws-unknown test]$ time ./test-O0
res = 4e+38

real    0m7.941s
user    0m7.848s
sys     0m0.001s
[unknown@ws-unknown test]$ time ./test-O1
res = 4e+38

real    0m4.714s
user    0m4.670s
sys     0m0.002s
[unknown@ws-unknown test]$ time ./test-O2
res = 4e+38

real    0m4.785s
user    0m4.611s
sys     0m0.001s
[unknown@ws-unknown test]$ time ./test-O3
res = 4e+38

real    0m2.830s
user    0m2.714s
sys     0m0.002s
[unknown@ws-unknown test]$ time ./test-O3-unroll
res = 4e+38

real    0m3.917s
user    0m3.829s
sys     0m0.001s


Това е при мен.
Активен

Г. Д. Сотиров

edmon

  • Гост
оптимизации -О2 и - О1
« Отговор #2 -: Jan 24, 2005, 18:25 »
мдам както писах проблема  е с дебиан тестинг .
Активен

Joro

  • Напреднали
  • *****
  • Публикации: 27
    • Профил
оптимизации -О2 и - О1
« Отговор #3 -: Jan 28, 2005, 21:02 »
Ето...
Някой ми беше казал, че gcc -O3 е по-добре от gcc -O2 щото L2 кеша ми бил доста голям. В случая - 1МБ.
Не ги разбирам много тея работи с програмирането но тогава търсех начин да го тествам но не открих.
Е, сега го тествах и май наистина е по добре '<img'>))

joro@najoro:~/cpp$ uname -a
Linux najoro 2.6.8-2-k7 #1 Sat Jan 8 15:48:58 EST 2005 i686 GNU/Linux
joro@najoro:~/cpp$ cat /etc/debian_version
3.1
joro@najoro:~/cpp$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 4
model name      : AMD Athlon™ 64 Processor 3000+
stepping        : 8
cpu MHz         : 1795.513
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow
bogomips        : 3555.32
joro@najoro:~$ gcc --version
gcc (GCC) 3.3.5 (Debian 1:3.3.5-5)
Copyright © 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

joro@najoro:~/cpp$ gcc -Wall -O0 -o test-O0 test.c
joro@najoro:~/cpp$ gcc -Wall -O1 -o test-O1 test.c
joro@najoro:~/cpp$ gcc -Wall -O2 -o test-O2 test.c
joro@najoro:~/cpp$ gcc -Wall -O3 -o test-O3 test.c
joro@najoro:~/cpp$ gcc -Wall -O3 -funroll-loops -o test-O3-unroll test.c

joro@najoro:~/cpp$ time ./test-O0
res = 4e+38

real    0m4.332s
user    0m4.224s
sys     0m0.002s

joro@najoro:~/cpp$ time ./test-O1
res = 4e+38

real    0m2.929s
user    0m2.807s
sys     0m0.002s

joro@najoro:~/cpp$ time ./test-O2
res = 4e+38

real    0m2.871s
user    0m2.727s
sys     0m0.000s

joro@najoro:~/cpp$ time ./test-O3
res = 4e+38

real    0m1.903s
user    0m1.848s
sys     0m0.000s

joro@najoro:~/cpp$ time ./test-O3-unroll
res = 4e+38

real    0m1.855s
user    0m1.825s
sys     0m0.000s
Активен

Йордан

  • Напреднали
  • *****
  • Публикации: 1451
  • Distribution: Ubuntu / Gentoo
  • Window Manager: Gnome
  • не е важно колко ти е голяма пишката, а какво можеш с нея
    • Профил
    • WWW
оптимизации -О2 и - О1
« Отговор #4 -: Jan 28, 2005, 21:43 »
Примерен код

gv@gigavolt:~/bin> uname -a
Linux gigavolt 2.6.8-24.11-default #1 Fri Jan 14 13:01:26 UTC 2005 i686 athlon i386 GNU/Linux
gv@gigavolt:~/bin> cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 3
model name      : AMD Duron(tm) Processor
stepping        : 1
cpu MHz         : 751.502
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr pni syscall mmxext 3dnowext 3dnow
bogomips        : 1486.84

gv@gigavolt:~/bin> gcc --version
gcc (GCC) 3.3.4 (pre 3.3.5 20040809)
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

gv@gigavolt:~/bin> cat test.c
#include <stdio.h>

double powern (double d, unsigned long n) {
 double x = 1.0;
 unsigned long j;

 for (j = 1; j <= n; j++)
   x *= d;

 return x;
}

int main (void) {
 double sum = 0.0;
 unsigned long i;

 for (i = 1; i <= 100000000; i++) {
   sum += powern (i, i % 5);
 }

 printf ("res = %g\n", sum);
 return 0;
}
gv@gigavolt:~/bin> gcc -Wall -O0 -o test-O0 test.c
gv@gigavolt:~/bin> gcc -Wall -O1 -o test-O1 test.c
gv@gigavolt:~/bin> gcc -Wall -O2 -o test-O2 test.c
gv@gigavolt:~/bin> gcc -Wall -O3 -o test-O3 test.c
gv@gigavolt:~/bin> gcc -Wall -O3 -funroll-loops -o test-O3-unroll test.c
gv@gigavolt:~/bin> ls -l
total 76
-rwxrwxrwx  1 gv users  199 2004-12-04 04:05 niki
-rw-r--r--  1 gv users  319 2005-01-28 21:39 test.c
-rw-r--r--  1 gv users  318 2005-01-28 21:34 test.c~
-rwxr-xr-x  1 gv users 9213 2005-01-28 21:39 test-O0
-rwxr-xr-x  1 gv users 9101 2005-01-28 21:39 test-O1
-rwxr-xr-x  1 gv users 9101 2005-01-28 21:39 test-O2
-rwxr-xr-x  1 gv users 9101 2005-01-28 21:39 test-O3
-rwxr-xr-x  1 gv users 9293 2005-01-28 21:39 test-O3-unroll
drwxr-xr-x  3 gv users 4096 2005-01-21 13:14 Vitosha 27_11_2004
gv@gigavolt:~/bin> time ./test-O0
res = 4e+38

real    0m20.971s
user    0m19.879s
sys     0m0.014s
gv@gigavolt:~/bin> time ./test-O1
res = 4e+38

real    0m12.337s
user    0m11.826s
sys     0m0.007s
gv@gigavolt:~/bin> time ./test-O2
res = 4e+38

real    0m11.684s
user    0m11.174s
sys     0m0.008s
gv@gigavolt:~/bin> time ./test-O3
res = 4e+38

real    0m7.166s
user    0m6.869s
sys     0m0.007s
gv@gigavolt:~/bin> time ./test-O3-unroll
res = 4e+38

real    0m8.454s
user    0m8.198s
sys     0m0.005s
Активен

Всеки пост - отговор на въпрос !!!

Йордан Георгиев
http://ygeorgiev.net/

Skydive

  • Участници
  • ***
  • Публикации: 4
    • Профил
оптимизации -О2 и - О1
« Отговор #5 -: Jan 28, 2005, 23:52 »
$ uname -a
Linux slackbox 2.4.29-1 #1 Tue Jan 25 13:56:52 EET 2005 i686 unknown unknown GNU/Linux

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 3
model name      : AMD Duron™ Processor
stepping        : 1
cpu MHz         : 755.065
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips        : 1507.32

$ gcc --version
gcc (GCC) 3.3.4

* * *

$ gcc -Wall -O0 test.c -lm
$ time ./a.out
sum = 4e+38
 
real    0m20.662s
user    0m20.630s
sys     0m0.010s


$ gcc -Wall -O1 test.c -lm
$ time ./a.out
sum = 4e+38

real    0m12.142s
user    0m12.140s
sys     0m0.000s


$ gcc -Wall -O2 test.c -lm
$ time ./a.out
sum = 4e+38

real    0m11.734s
user    0m11.730s
sys     0m0.000s


$ gcc -Wall -O3 test.c -lm
$ time ./a.out
sum = 4e+38

real    0m7.004s
user    0m7.000s
sys     0m0.000s


$ gcc -Wall -O3 -funroll-loops test.c -lm
$ time ./a.out
sum = 4e+38

real    0m9.910s
user    0m9.890s
sys     0m0.000s
Активен

Slackware powered. Anything's possible!

rat

  • Напреднали
  • *****
  • Публикации: 266
    • Профил
оптимизации -О2 и - О1
« Отговор #6 -: Feb 01, 2005, 14:54 »
Примерен код

uname -a
Linux joro 2.6.8-24-default #1 Wed Oct 6 09:16:23 UTC 2004 i686 athlon i386 GNU/Linux
----------------------------------------------------
rat@joro:~/test> cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) XP 2000+
stepping        : 1
cpu MHz         : 1662.323
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse pni syscall mmxext 3dnowext 3dnow
bogomips        : 3293.18
----------------------------------------------------
time ./test0
sum = 4e+38

real    0m23.492s
user    0m9.504s
sys     0m0.013s

rat@joro:~/test> time ./test1
sum = 4e+38

real    0m14.428s
user    0m5.640s
sys     0m0.009s
rat@joro:~/test> time ./test2
sum = 4e+38

real    0m12.726s
user    0m5.330s
sys     0m0.008s
rat@joro:~/test> time ./test3
sum = 4e+38

real    0m7.810s
user    0m3.272s
sys     0m0.003s
rat@joro:~/test> time ./test31-unroll
sum = 4e+38

real    0m9.400s
user    0m3.933s
sys     0m0.003s

Мен ме изненадва последния резултат. Според мен той трябва да по-добър от предпоследния.