Compilaciones optimizadas de Sumatra PDF

Como usuario regular de Sumatra PDF, y dado que los últimos cambios del código, permiten una compilación directa x64, me ha extrañado que no haya nadie haciendo compilaciones regulares para esa plataforma.

Por tanto, he tenido que tomar la iniciativa, y desde hoy iré publicando con regularidad (al menos una vez al mes), compilaciones actualizadas para Windows de 64 bits de Sumatra PDF, en el proyecto que he llamado SumatraPDF x86/x64 Optimized Builds (SumatraPDFOpt).

Sumatra PDF Opt, parte de los últimos fuentes en el repositorio de desarrollo, donde no he realizado ningún cambio en absoluto.

Si hay novedades en cambio en la tool chain, nuevas opciones de compilación, un nuevo compilador, y un compresor diferente, hacen que mis binarios sean más pequeños, veloces y compactos.

Podéis descargarlo, y consultar más detalles en nikkhokkho.sourceforge.net/static.php?page=SumatraPDFOpt

Compilaciones optimizadas de Sumatra PDF

28 comentarios en “Compilaciones optimizadas de Sumatra PDF”

  1. – Using Profile Guided Optimizations (PGO) for x64 build. 200 KB smaller, and 10% faster. Let me know if you are interested in other platforms).

    Could you make PGO builds for SSE2?

  2. Javier Gutiérrez Chamorro (Guti)

    Since now anonymous, all my builds have PGO enabled. It is more time-consuming, and seems that Visual C++ 2010 does not like so mucho the PGO SSE2 builds, they are now ready at SumatraPDFOpt builds.

  3. > Since now anonymous, all my builds have PGO enabled.
    Thank you
    > It is more time-consuming, and seems that Visual C++ 2010 does not like so mucho the PGO SSE2 builds
    Pity. Hope it is not too much over time and you will be able to continue to provide them in the future. It seems that for x86 it is only 10kb smaller, and I wonder how to test the speed up. Any suggestion?

  4. Javier Gutiérrez Chamorro (Guti)

    Hi anonymous, I will try to do my best in continue providing PGO optimized builds for all targets in the future.

    If interested in bechmarks, you can see the results I obtained at http://code.google.com/p/sumatrapdf/iss … d=1129#c16

    For doing your own tests, you can use the -bench switch like:
    "c:\Program Files (x86)\utilidades\SumatraPDF.exe" -bench "c:\file.pdf"

    This will render internally the whole PDF, so it is easy to keep track of the time spend with an stop-watch or similar tool.

    The size reduction is before compresing the executable, and comparing it with the builds without PGO, anyway, I encourage you doing your own tests comparing them with the official SumatraPDF and keep us informed on the results.

    Regards.

  5. Hi Guti)), than you for the –bench switch, it does not seem to be documented on the official page. So I went on and run some quick tests.

    System:
    Intel Core Duo L2400@1.66GHz/1GB DDR2@667MHz/GMA950
    WinXP Pro SP3 up to date, HAL v5.1.2600.5512 (xpsp.080413-2111)
    Timed using Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31 from http://sourceforge.net/projects/sevenma … Benchmark/

    Could not find SumatraPDF-prerelease-2813.exe, but according to the change log http://code.google.com/p/sumatrapdf/updates/list the engine was not changed between 2813 and 2816:
    r2814 (skip invalid content streams (fixes issue 1239))
    r2815 (ignore author-set zoom in bookmarks (fixes issue 1227))
    r2816 (add -fullscreen command line switch (cf. issue 1238))
    also tested 2818 with the newly synched MuPDF engine:
    r2817 (merge latest MuPDF update)
    r2818 (update Italian translation (contributed by Alessandro Marian…)

    Test files:
    From: http://help.adobe.com/en_US/photoshop/c … 5_help.pdf 42'358'932 bytes CRC: B0507DA7 SHA-256: 1292E808B73C0D2E90EF6FFE39C3B4615AACFCCB781D747659C0F49FD30CC51B
    SumatraPDF-SSE2+PGO-2813.exe 1'654'272 bytes CRC: 447BCA0E SHA-256: 1F7062EA2FE6C5EDD61246D867A5A7C2CF893A3AD4A3A3CBA79D0B484791D611
    run 1
    Kernel Time = 3.125 = 12%
    User Time = 21.890 = 86%
    Process Time = 25.015 = 99%
    Global Time = 25.184 = 100%
    run 2
    Kernel Time = 3.390 = 13%
    User Time = 21.578 = 85%
    Process Time = 24.968 = 98%
    Global Time = 25.234 = 100%
    run 3
    Kernel Time = 3.265 = 12%
    User Time = 21.875 = 81%
    Process Time = 25.140 = 93%
    Global Time = 26.987 = 100%

    SumatraPDF-prerelease-2816.exe 1'618'944 bytes CRC: 3EA3D8F1 SHA-256: DADD1668564A2AADB1575FADF79CC4B7722C1B2CF230A0CE3185D4CF9F96E774
    run 1
    Kernel Time = 3.437 = 10%
    User Time = 28.453 = 88%
    Process Time = 31.890 = 99%
    Global Time = 32.203 = 100%
    run 3
    Kernel Time = 3.046 = 9%
    User Time = 28.812 = 89%
    Process Time = 31.859 = 99%
    Global Time = 32.115 = 100%
    run 3
    Kernel Time = 3.359 = 10%
    User Time = 28.609 = 88%
    Process Time = 31.968 = 99%
    Global Time = 32.225 = 100%

    SumatraPDF-prerelease-2818.exe 1'620'992 bytes CRC: 836E5F20 SHA-256: 2BA16B6965B4A033E8B5F8979C8EFFB1643137B1D863E024C330FCB1FBB81519
    run 1
    Kernel Time = 3.171 = 9%
    User Time = 28.765 = 89%
    Process Time = 31.937 = 98%
    Global Time = 32.288 = 100%
    run 3
    Kernel Time = 2.937 = 9%
    User Time = 29.031 = 90%
    Process Time = 31.968 = 99%
    Global Time = 32.115 = 100%
    run 3
    Kernel Time = 3.437 = 10%
    User Time = 28.500 = 88%
    Process Time = 31.937 = 99%
    Global Time = 32.203 = 100%

    could not find SlickEdit_User_Guide.pdf, so tried another one reported as slow on sumatraPDF forum http://code.google.com/p/sumatrapdf/iss … id=1218#c4 Alice_s_Adventures_in_Wonderland.pdf 6'920'148 bytes CRC: 1D1152DF SHA-256: 1DF7AFC064C74D655D5557EC2991C798A4482BA54DB911FE64717BC902E56FFD, but run out of memory when the process reached 1.5GB. 🙁
    Settled to test with this file:
    From: https://sunsolve.sun.co.jp/data/816/816-0996-11/pdf/816-0996-11.pdf 74290004 bytes CRC: C306F71C SHA-256: 146FB30E427B5A76D29DC32E70F9FF69FB2AD56FE8E9BB185520EBB18A297439

    SumatraPDF-SSE2+PGO-2813.exe 1'654'272 bytes CRC: 447BCA0E SHA-256: 1F7062EA2FE6C5EDD61246D867A5A7C2CF893A3AD4A3A3CBA79D0B484791D611
    run 1
    Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
    Kernel Time = 2.671 = 8%
    User Time = 21.781 = 72%
    Process Time = 24.453 = 81%
    Global Time = 30.017 = 100%
    run2
    Kernel Time = 2.484 = 10%
    User Time = 21.796 = 88%
    Process Time = 24.281 = 99%
    Global Time = 24.502 = 100%
    run 3
    Kernel Time = 2.531 = 10%
    User Time = 21.843 = 89%
    Process Time = 24.375 = 99%
    Global Time = 24.530 = 100%

    SumatraPDF-prerelease-2816.exe 1'618'944 bytes CRC: 3EA3D8F1 SHA-256: DADD1668564A2AADB1575FADF79CC4B7722C1B2CF230A0CE3185D4CF9F96E774
    run 1
    Kernel Time = 2.765 = 9%
    User Time = 25.968 = 89%
    Process Time = 28.734 = 99%
    Global Time = 28.978 = 100%
    run 2
    Kernel Time = 2.531 = 8%
    User Time = 26.234 = 90%
    Process Time = 28.765 = 99%
    Global Time = 28.944 = 100%
    run 3
    Kernel Time = 2.390 = 8%
    User Time = 26.375 = 91%
    Process Time = 28.765 = 99%
    Global Time = 28.901 = 100%

    SumatraPDF-prerelease-2818.exe 1'620'992 bytes CRC: 836E5F20 SHA-256: 2BA16B6965B4A033E8B5F8979C8EFFB1643137B1D863E024C330FCB1FBB81519
    run 1
    Kernel Time = 2.734 = 9%
    User Time = 26.062 = 89%
    Process Time = 28.796 = 99%
    Global Time = 29.025 = 100%
    run 2
    Kernel Time = 2.843 = 9%
    User Time = 25.921 = 89%
    Process Time = 28.765 = 99%
    Global Time = 28.930 = 100%
    run 3
    Kernel Time = 2.437 = 8%
    User Time = 26.250 = 90%
    Process Time = 28.687 = 99%
    Global Time = 28.874 = 100%

    Summary:
    SSE2+PGO seem to provide a nice 15% speedup on the test file and up to 22% on the profile file.
    Updated MuPDF merged in version 2817 has the same speed as the previous one.
    Final compressed SSE2+PGO file is 35KB (2%) larger than official build.

    Next: need a tool to that also shows memory usage or, better, it could be shown with the -bench switch as output. Ideally, it could also integrate the timer.exe code, which is public domain, to be self-sufficient and may be even automate speed tests.

  6. Javier Gutiérrez Chamorro (Guti)

    Thank you very much for your detailed benchmarks anonymous, and glad to see this 15% speed improvement.
    As for the size, consider that my builds use ASMLib that require an overhead of about 15 KB, for a benefict of a 0.5% overall speed increase, and have some SyncText features enabled. Those features, are not available in the official prerelease, so extra code is required.
    Also, since my builds favor fast code due to enhanced compilation switches, they are probably generating slightly larger code. More if we consider that SSE/SSE2 and x64 instruction sets, has bigger sizes.
    Will try to increase this figures in the future. Do not hesitate suggesting any ideas you could have to build even faster compiles.
    I have been looking for that memory tool but did not find any. Anyway, expect a slight memory usage increase in my builds due to the reasons mentioned.
    Using ProcessExplorer, they are in the range of 1%-2%.

  7. > As for the size, consider that my builds use ASMLib that require an overhead of about 15 KB, for a benefict of a 0.5% overall speed increase, and have some SyncText features enabled. Those features, are not available in the official prerelease, so extra code is required.
    Also, since my builds favor fast code due to enhanced compilation switches, they are probably generating slightly larger code. More if we consider that SSE/SSE2 and x64 instruction sets, has bigger sizes.
    Oh, don’t worry, I was just reporting it for the sake of completeness.

    > Do not hesitate suggesting any ideas you could have to build even faster compiles.
    Well I’m not quite versed in software optimization and compilers to do so. But I can do some testing I you want. I saw that in the issue db you posted a comparison with different compiler settings. If you are interested and provide the executables, I can do the same testing on my computer.
    It may be a stupid idea, but can a universal x86 executable be compiled? The one that will automatically use SSE/SSE2 if the CPU support them?
    > I have been looking for that memory tool but did not find any. Anyway, expect a slight memory usage increase in my builds due to the reasons mentioned.
    Using ProcessExplorer, they are in the range of 1%-2%.
    Yes, the usual space vs. time trade-off. I don’t mind that, but it would be interesting to measure it quantitatively.

  8. I get "La página a la que intentas acceder, no se encuentra disponible en este momento." when using the link above. Wrong address or access rights?

  9. Run the benchmarks, but was quite busy so could not upload them earlier. Two CPU tested this time: Intel Core Duo L2400@1.66GHz and P4@2,66GHz.

    Timed using Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31 from http://sourceforge.net/projects/sevenmax/files/7-Benchmark/9.03/
    Test files:
    From: http://help.adobe.com/en_US/photoshop/cs/using/photoshop_cs5_help.pdf 42’358’932 bytes CRC: B0507DA7 SHA-256: 1292E808B73C0D2E90EF6FFE39C3B4615AACFCCB781D747659C0F49FD30CC51B
    From: https://sunsolve.sun.co.jp/data/816/816-0996-11/pdf/816-0996-11.pdf 74290004 bytes CRC: C306F71C SHA-256: 146FB30E427B5A76D29DC32E70F9FF69FB2AD56FE8E9BB185520EBB18A297439

    Could not find SumatraPDF-prerelease-2833.exe, but according to the change log http://code.google.com/p/sumatrapdf/updates/list 2833 and 2831 should be identical:
    r2833 (remove unneeded #ifdefs from the installer, revert r2832)
    SumatraPDF-prerelease-2831.exe 1622528 bytes CRC: D8419F2F SHA-256: AD95F45FA7B796E95AF9667FE16EE533D86706A07DAB1D25A5D5A2076EF5252B
    SumatraPDF-SSE2+PGO-2833.exe 1655808 bytes CRC: 6D5DE11F SHA-256: 1A59D643A8F8532393C450537ABB043D08C4FD5296FF3CA2406FADC690B16547

    System: Intel Core Duo L2400@1.66GHz/1GB DDR2@667MHz/GMA950
    WinXP Pro SP3 up to date, HAL.DLL v5.1.2600.5512 (xpsp.080413-2111), pagefile.sys 1GB
    SumatraPDF-prerelease-2831.exe
    run1 photoshop_cs5_help.pdf
    Kernel Time = 3.375 = 10%
    User Time = 28.812 = 88%
    Process Time = 32.187 = 98%
    Global Time = 32.696 = 100%
    run2
    Kernel Time = 3.546 = 10%
    User Time = 28.500 = 88%
    Process Time = 32.046 = 99%
    Global Time = 32.263 = 100%
    run3
    Kernel Time = 3.515 = 10%
    User Time = 28.703 = 88%
    Process Time = 32.218 = 99%
    Global Time = 32.532 = 100%
    run1 816-0996-11.pdf
    Kernel Time = 3.109 = 6%
    User Time = 26.312 = 56%
    Process Time = 29.421 = 62%
    Global Time = 46.850 = 100%
    run2
    Kernel Time = 2.484 = 6%
    User Time = 26.390 = 73%
    Process Time = 28.875 = 80%
    Global Time = 35.759 = 100%
    run3
    Kernel Time = 2.765 = 7%
    User Time = 26.078 = 73%
    Process Time = 28.843 = 81%
    Global Time = 35.578 = 100%

    SumatraPDF-SSE2+PGO-2833.exe
    run1 photoshop_cs5_help.pdf
    Kernel Time = 3.531 = 13%
    User Time = 21.671 = 80%
    Process Time = 25.203 = 93%
    Global Time = 26.866 = 100%
    run2
    Kernel Time = 3.296 = 12%
    User Time = 21.843 = 85%
    Process Time = 25.140 = 98%
    Global Time = 25.453 = 100%
    run3
    Kernel Time = 3.250 = 12%
    User Time = 21.921 = 86%
    Process Time = 25.171 = 99%
    Global Time = 25.377 = 100%
    run1 816-0996-11.pdf
    Kernel Time = 2.843 = 10%
    User Time = 21.640 = 81%
    Process Time = 24.484 = 91%
    Global Time = 26.659 = 100%
    run2
    Kernel Time = 2.640 = 10%
    User Time = 21.828 = 85%
    Process Time = 24.468 = 96%
    Global Time = 25.434 = 100%
    run3
    Kernel Time = 2.921 = 11%
    User Time = 21.515 = 87%
    Process Time = 24.437 = 99%
    Global Time = 24.600 = 100%
    Same speedup of ~15% on test file and 22% on profile file.

    System: P4@2,66GHz/1GB DDR@266MHz/IGP345M
    WinXP Pro SP3 up to date, HAL.DLL v5.1.2600.5512 (xpsp.080413-2111), pagefile 500MB
    SumatraPDF-prerelease-2831.exe 1’622’528
    run 1 photoshop_cs5_help.pdf
    Kernel Time = 13.799 = 18%
    User Time = 46.186 = 62%
    Process Time = 59.986 = 81%
    Global Time = 73.809 = 100%
    run 2
    Kernel Time = 18.416 = 18%
    User Time = 74.577 = 76%
    Process Time = 92.993 = 95%
    Global Time = 97.134 = 100%
    run 3
    Kernel Time = 18.897 = 19%
    User Time = 74.547 = 75%
    Process Time = 93.444 = 94%
    Global Time = 98.470 = 100%
    run 1 816-0996-11.pdf
    Kernel Time = 13.689 = 16%
    User Time = 64.843 = 78%
    Process Time = 78.532 = 95%
    Global Time = 82.418 = 100%
    run 2
    Kernel Time = 13.849 = 16%
    User Time = 65.053 = 79%
    Process Time = 78.903 = 96%
    Global Time = 82.132 = 100%
    run 3
    Kernel Time = 14.771 = 17%
    User Time = 65.734 = 77%
    Process Time = 80.505 = 94%
    Global Time = 85.106 = 100%

    SumatraPDF-SSE2+PGO-2833.exe 1’655’808
    run 1 photoshop_cs5_help.pdf
    Kernel Time = 18.777 = 21%
    User Time = 60.557 = 70%
    Process Time = 79.334 = 92%
    Global Time = 85.889 = 100%
    run 2
    Kernel Time = 18.696 = 22%
    User Time = 60.607 = 72%
    Process Time = 79.304 = 95%
    Global Time = 83.376 = 100%
    run 3
    Kernel Time = 18.546 = 22%
    User Time = 60.957 = 72%
    Process Time = 79.504 = 94%
    Global Time = 83.711 = 100%
    run 1 816-0996-11.pdf
    Kernel Time = 7.380 = 15%
    User Time = 30.784 = 64%
    Process Time = 38.164 = 80%
    Global Time = 47.424 = 100%
    run 2
    Kernel Time = 10.755 = 19%
    User Time = 41.659 = 76%
    Process Time = 52.415 = 95%
    Global Time = 54.703 = 100%
    run 3
    Kernel Time = 13.349 = 17%
    User Time = 56.270 = 75%
    Process Time = 69.620 = 93%
    Global Time = 74.709 = 100%
    run 4
    Kernel Time = 14.450 = 19%
    User Time = 56.290 = 75%
    Process Time = 70.741 = 94%
    Global Time = 74.725 = 100%

    For some reason measurements are unstable and tend to get worth with each subsequent test, but SSE2+PGO builds are still ahead. Could have been the Intel SpeesStep, but after checking the processor does not support it. Will try to investigate further with the next version. 🙂

  10. Javier Gutiérrez Chamorro (Guti)

    That are magnific and consistent results.

    For your information there is an update available at the level of r2958. But the good news is that after that and upto the current source at level of r2998 the SumatraPDF team has made some more optimizations.

    Additionally, I plan to upgrade to Visual Studio 2010 SP1, that should hopefully give a slight improvement.

  11. Yes, I already downloaded your build r2958, but in the light of what you are saying I will wait for your next build before re-running the tests.
    It will be interesting see if the SP1 is any faster, even though I haven’t seen anything about speed optimisations, except for upcoming processors: http://support.microsoft.com/kb/983509

    On another register, do you have access to the Intel compiler? It is rumoured to produce the fastest code and with the code cripple for AMD CPUs finally removed it can be worth trying.

  12. > the good news is that after that and upto the current source at level of r2998 the SumatraPDF team has made some more optimizations.
    Are you sure about this one? I just downloaded version 1.4 that came out, and it time the same as rerelease-2831 and sometimes even slower 34 vs 32 and 31 vs 29 on the two test files. Or perhaps the 1.4 branch was created before the r2998?

  13. Javier Gutiérrez Chamorro (Guti)

    Even if not yet officially announced (I will be doing some tests these days), the download page of SumatraPDFOpt contains builds r3002 now, made with Visual C++ 2010 SP1. They seem to be a bit faster on x64, but the same on the rest.

    I am aware of Intel C++, and in fact tried to build SumatraPDF with it some weeks ago. The problem is that SumatraPDF relies on makefiles, and it was not transparent nor trivial to figure out how to build, so no success on that. Maybe will try again later.

    I see those optimizations in smaller executables, by cleaning-up lots of unused code, but have not done performance tests, so maybe you are right, and they are the same speed. Or maybe the rewrite on C++ of some packages, have made it a bit slower. Will need further investigation on that.

  14. > the download page of SumatraPDFOpt contains builds r3002 now, made with Visual C++ 2010 SP1.
    I tried to download a close version of regular Sumatra, but it seems that anything before 3013 does not exist. 🙁 I can’t even understand, from reading the http://code.google.com/p/sumatrapdf/updates/list, what revision would the release 1.4 correspond to. So what version should I compare it to?
    > I am aware of Intel C++, and in fact tried to build SumatraPDF with it some weeks ago. The problem is that SumatraPDF relies on makefiles, and it was not transparent nor trivial to figure out how to build, so no success on that. Maybe will try again later.
    Good luck then, and keep us informed.
    > I see those optimizations in smaller executables, by cleaning-up lots of unused code,
    Indeed, version 2958 is smaller by 15kB, compared to 2833.
    > but have not done performance tests, so maybe you are right, and they are the same speed. Or maybe the rewrite on C++ of some packages, have made it a bit slower. Will need further investigation on that.
    Ready and willing! Just post the executables 🙂

  15. Thank you.
    Since there was a corresponding pre-release version 3116, I did some more extended testing and added x86 generic executable. Also tried the SSE build, but it turned out to be binary identical to the SSE2. Does it mean that there are no possible SSE2 optimizations in the code? Or are the executables actually “universal” and contain both types of optimizations? Unfortunately, I don’t have an SSE-only computer to try it out.
    Here are the results:
    Timed using Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31 from http://sourceforge.net/projects/sevenmax/files/7-Benchmark/9.03/
    Test files:
    From: http://help.adobe.com/en_US/photoshop/cs/using/photoshop_cs5_help.pdf 42’358’932 bytes CRC: B0507DA7 SHA-256: 1292E808B73C0D2E90EF6FFE39C3B4615AACFCCB781D747659C0F49FD30CC51B
    From: https://sunsolve.sun.co.jp/data/816/816-0996-11/pdf/816-0996-11.pdf 74290004 bytes CRC: C306F71C SHA-256: 146FB30E427B5A76D29DC32E70F9FF69FB2AD56FE8E9BB185520EBB18A297439

    Executables:
    SumatraPDF-prerelease-3116.exe 1553920 bytes CRC: 7E036DC0 SHA-256: 4CFCFD02B7DC6F8583FBCA8FD750289DFBE47B1083279FFE1E98F5C6C872DF29
    SumatraPDF-SSE2+PGO-3116.exe 1571328 bytes CRC: 8D7BD52A SHA-256: BB46B89251BD05888303E23F6527C1A5D9F6EF86762F537E1B2B448ED2C1B266
    SumatraPDF-PGO-3116.exe (plain x86) 1567232 bytes CRC: 3C02D5F7 SHA-256: 1CF9C92AD85B5AA556B2100B34BE985432001DB369C5B974B7E7104BFFA55FE6

    System: Intel Core Duo L2400@1.66GHz/1GB DDR2@667MHz/GMA950
    WinXP Pro SP3 up to date, HAL.DLL v5.1.2600.5512 (xpsp.080413-2111), pagefile.sys 1GB

    photoshop_cs5_help.pdf
    SumatraPDF-prerelease-3116.exe
    run1
    Kernel Time = 3.421 = 10%
    User Time = 30.546 = 94%
    Process Time = 33.968 = 105%
    Global Time = 32.340 = 100%
    run2
    Kernel Time = 3.640 = 11%
    User Time = 30.687 = 94%
    Process Time = 34.328 = 106%
    Global Time = 32.317 = 100%
    run3
    Kernel Time = 3.687 = 11%
    User Time = 30.203 = 93%
    Process Time = 33.890 = 104%
    Global Time = 32.369 = 100%

    SumatraPDF-SSE2+PGO-3116.exe
    run1
    Kernel Time = 3.156 = 12%
    User Time = 23.171 = 92%
    Process Time = 26.328 = 104%
    Global Time = 25.094 = 100%
    run2
    Kernel Time = 3.468 = 13%
    User Time = 22.781 = 90%
    Process Time = 26.250 = 104%
    Global Time = 25.065 = 100%
    run3
    Kernel Time = 3.375 = 13%
    User Time = 23.062 = 92%
    Process Time = 26.437 = 105%
    Global Time = 25.019 = 100%

    SumatraPDF-PGO-3116.exe
    run1
    Kernel Time = 2.953 = 11%
    User Time = 23.906 = 94%
    Process Time = 26.859 = 106%
    Global Time = 25.214 = 100%
    run2
    Kernel Time = 3.515 = 13%
    User Time = 22.890 = 88%
    Process Time = 26.406 = 102%
    Global Time = 25.817 = 100%
    run3
    Kernel Time = 3.281 = 13%
    User Time = 23.437 = 93%
    Process Time = 26.718 = 106%
    Global Time = 25.162 = 100%

    816-0996-11.pdf
    SumatraPDF-prerelease-3116.exe
    run1
    Kernel Time = 2.859 = 9%
    User Time = 28.156 = 94%
    Process Time = 31.015 = 103%
    Global Time = 29.923 = 100%
    run2
    Kernel Time = 2.593 = 8%
    User Time = 28.062 = 96%
    Process Time = 30.656 = 104%
    Global Time = 29.228 = 100%
    run3
    Kernel Time = 2.562 = 8%
    User Time = 27.500 = 94%
    Process Time = 30.062 = 102%
    Global Time = 29.249 = 100%

    SumatraPDF-SSE2+PGO-3116.exe
    run1
    Kernel Time = 2.640 = 10%
    User Time = 23.656 = 96%
    Process Time = 26.296 = 106%
    Global Time = 24.634 = 100%
    run2
    Kernel Time = 2.546 = 10%
    User Time = 23.468 = 95%
    Process Time = 26.015 = 105%
    Global Time = 24.650 = 100%
    run3
    Kernel Time = 3.109 = 12%
    User Time = 22.781 = 92%
    Process Time = 25.890 = 105%
    Global Time = 24.648 = 100%

    SumatraPDF-PGO-3116.exe
    run1
    Kernel Time = 3.078 = 12%
    User Time = 22.171 = 89%
    Process Time = 25.250 = 102%
    Global Time = 24.744 = 100%
    run2
    Kernel Time = 2.468 = 9%
    User Time = 23.234 = 93%
    Process Time = 25.703 = 103%
    Global Time = 24.765 = 100%
    run3
    Kernel Time = 3.140 = 12%
    User Time = 22.875 = 92%
    Process Time = 26.015 = 105%
    Global Time = 24.736 = 100%
    The performance stays the same: ~22% faster for the PGO file and ~15% for the test file. File size increase is ~1% (15-18kB)

    System: P4@2,66GHz/1GB DDR@266MHz/IGP345M
    WinXP Pro SP3 up to date, HAL.DLL v5.1.2600.5512 (xpsp.080413-2111), pagefile 500MB
    photoshop_cs5_help.pdf
    SumatraPDF-prerelease-3116.exe
    run1
    Kernel Time = 14.110 = 18%
    User Time = 50.322 = 65%
    Process Time = 64.432 = 84%
    Global Time = 76.429 = 100%
    run2
    Kernel Time = 14.560 = 19%
    User Time = 52.625 = 69%
    Process Time = 67.186 = 88%
    Global Time = 75.747 = 100%
    run3
    Kernel Time = 14.380 = 19%
    User Time = 52.275 = 69%
    Process Time = 66.655 = 89%
    Global Time = 74.731 = 100%

    SumatraPDF-SSE2+PGO-3116.exe
    run1
    Kernel Time = 13.829 = 21%
    User Time = 42.000 = 66%
    Process Time = 55.830 = 88%
    Global Time = 63.351 = 100%
    run2
    Kernel Time = 13.779 = 22%
    User Time = 42.180 = 67%
    Process Time = 55.960 = 89%
    Global Time = 62.509 = 100%
    run3
    Kernel Time = 13.940 = 22%
    User Time = 41.780 = 66%
    Process Time = 55.720 = 88%
    Global Time = 63.283 = 100%

    SumatraPDF-PGO-3116.exe
    run1
    Kernel Time = 13.669 = 20%
    User Time = 42.350 = 65%
    Process Time = 56.020 = 85%
    Global Time = 65.140 = 100%
    run2
    Kernel Time = 14.220 = 22%
    User Time = 43.282 = 67%
    Process Time = 57.502 = 89%
    Global Time = 64.376 = 100%
    run3
    Kernel Time = 14.540 = 21%
    User Time = 41.499 = 62%
    Process Time = 56.040 = 84%
    Global Time = 66.472 = 100%

    816-0996-11.pdf
    SumatraPDF-prerelease-3116.exe
    run1
    Kernel Time = 9.794 = 5%
    User Time = 46.677 = 25%
    Process Time = 56.471 = 31%
    Global Time = 180.498 = 100%
    run2
    Kernel Time = 9.643 = 7%
    User Time = 45.685 = 33%
    Process Time = 55.329 = 40%
    Global Time = 136.496 = 100%
    run3
    Kernel Time = 7.981 = 4%
    User Time = 45.986 = 27%
    Process Time = 53.967 = 31%
    Global Time = 169.129 = 100%

    SumatraPDF-SSE2+PGO-3116.exe
    run1
    Kernel Time = 6.649 = 4%
    User Time = 38.946 = 26%
    Process Time = 45.595 = 30%
    Global Time = 149.705 = 100%
    run2
    Kernel Time = 11.416 = 21%
    User Time = 39.636 = 73%
    Process Time = 51.053 = 95%
    Global Time = 53.717 = 100%
    run3
    Kernel Time = 11.306 = 20%
    User Time = 41.079 = 74%
    Process Time = 52.385 = 95%
    Global Time = 54.968 = 100%

    SumatraPDF-PGO-3116.exe
    run1
    Kernel Time = 11.166 = 19%
    User Time = 41.109 = 72%
    Process Time = 52.275 = 91%
    Global Time = 56.987 = 100%
    run2
    Kernel Time = 11.055 = 19%
    User Time = 40.988 = 73%
    Process Time = 52.044 = 92%
    Global Time = 56.117 = 100%
    run3
    Kernel Time = 11.556 = 20%
    User Time = 40.498 = 73%
    Process Time = 52.054 = 94%
    Global Time = 55.085 = 100%

    A ~15% speed-up for the PGO file and 8-10% on the test file. The speed fluctuations observed last time are mostly gone, but their origin is still unknown.

    Overall, the speedup due to optimizations is real and consistent across versions. The test file speedup is ~2/3 of the profile file.
    Plain x86 and SSE2 optimizations produce almost identical results, and SSE and SSE2 executables are identical at the binary level.
    Now that the test chain is in place, it would be interesting to test the WDK build from this thread: http://forums.fofou.org/sumatrapdf/topic?id=1924321, but the executable file at the bottom of the thread is gone. Do you think you can compile a new one using the patches from the thread?

  16. Javier Gutiérrez Chamorro (Guti)

    About the 3 x86 builds there are some clarifications that probably will help you:
    – libjpeg-turbo uses CPU runtime detection, this means that even if you use the regular x86 build, when decoding JPEG images, it will use MMX, SSE or SSE2 if available.
    – Rest of the code is statically optimized, this means that x86 version, will not use SSE nor SSE2 even if available for instance while rendering to screen, or while decoding the PDF itself.
    – SSE2 binaries are identically to SSE due to a bug in Visual C++ 2010 that when combined with PGO makes it crash. It was reported to Microsoft at http://connect.microsoft.com/VisualStudio/feedback/details/643255/sse2-produces-internal-error-ha-occurred-in-compiler but they were unable to reproduce it. So probably it is a strange combination of the samples in PGO combined with some source code combinations. It did not happen with SumatraPDF r28xx for instance, and has not been solved with visual C++ 2010 SP1. In that case, until solved SSE2 are equal to SSE, since only SSE1 instruction set is used.

    Problem with WDK is that it requires building with Visual C++ 2005, which does not support PGO, and probably generates slightly slower code.

    Of course, it allows to dynamic link against Visual C++ runtimes dynamically (msvcrt.dll), so this code is not embeded on the executable, which could save in a best case scenario upto 260 Kb. of uncompressed code in the x86 builds, and 354 Kb. in the x64 builds.

    The only drawback as you have probably read, is that the DLL file needs to be installed on the system, but it comes out-of-the-box in all recent versions of Windows.

    In a few words, WDK builds are smaller, but will not be faster at all.

  17. Hello Guti).
    Thank you very much for this detailed explanation. A very instructive reading.
    > About the 3 x86 builds there are some clarifications that probably will help you:
    > – libjpeg-turbo uses CPU runtime detection, this means that even if you use the regular x86 build, when decoding JPEG images, it will use MMX, SSE or SSE2 if available.
    This may explain the very small difference between the normal x86 and SSE builds. But does it mean that most of the speedup comes from image decoding or from PGO? If you can provide corresponding builds I can test it for you.
    > – SSE2 binaries are identically to SSE due to a bug in Visual C++ 2010 that when combined with PGO makes it crash. It was reported to Microsoft athttp://connect.microsoft.com/VisualStudio/feedback/details/643255/sse2-produces-internal-error-ha-occurred-in-compiler but they were unable to reproduce it. and has not been solved with visual C++ 2010 SP1.
    And since the original reporter did not provide more information it is not likely to be fixed, unfortunately. 🙁
    > Problem with WDK is that it requires building with Visual C++ 2005,
    Are you sure? From what I understood from reading the posts, the first patch adds support for VC2010: “patch if you want to give it a go for msvc2010
    http://pastebin.com/PzaLEd4V”

    > The only drawback as you have probably read, is that the DLL file needs to be installed on the system, but it comes out-of-the-box in all recent versions of Windows.
    Correct me if I am wrong. Current SumatraPDF, be it official or yours, are statically linked and does not require any external library, while the WDK build will require the msvcrt.dll v6.x, but was basically included with every Windows version since 98SE?
    > In a few words, WDK builds are smaller, but will not be faster at all.
    I’d be interested to give it a try anyway. 😉

  18. Could you also build x64 version of PdfFilter.dll? This filter is missing for the x64 version of Windows and MS Desktop Search.

    By the way, what is the use of libmupdf.dll in the SumatraPDF folder? Does it require to be built as x64 to use with your optimized SumatraPDF.exe?

  19. Javier Gutiérrez Chamorro (Guti)

    Hi Cecil,

    I already answered you by email, nevertheless, here is my reply.
    libmupdf.dll is the muPDF library, which can be used if you dynamically link SumatraPDF. It makes the EXE smaller at the cost of requiring to use the DLL, and the loading speed is a bit worse.
    As for the IFilter, I tryied to build it some time ago, but gave some compilation problems under x64. Will try again when I have some time to hack the makefiles.
    Thank you very much, I am glad you liked the extra performance of my builds.

    Regards.

Deja un comentario