Are there any compelling performance reasons to choose static linking over dynamic linking or vice versa in certain situations? I've heard or read the following, but I don't know enough on the subject to vouch for its veracity.
1) The difference in runtime performance between static linking and dynamic linking is usually negligible.
2) (1) is not true if using a profiling compiler that uses profile data to optimize program hotpaths because with static linking, the compiler can optimize both your code and the library code. With dynamic linking only your code can be optimized. If most of the time is spent running library code, this can make a big difference. Otherwise, (1) still applies.
Some edits to include the very relevant suggestions in the comments and in other answers. I'd like to note that the way you break on this depends a lot on what environment you plan to run in. Minimal embedded systems may not have enough resources to support dynamic linking. Slightly larger small systems may well support dynamic linking, because their memory is small enough to make the RAM savings from dynamic linking very attractive. Full blown consumer PCs have, as Mark notes, enormous resources, and you can probably let the convenience issues drive your thinking on this matter.
To address the performance and efficiency issues: it depends.
Classically, dynamic libraries require a some kind of glue layer which often means double dispatch or an extra layer of indirection in function addressing and can cost a little speed (but is function calling time actually a big part of your running time???).
However, if you are running multiple processes which all call the same library a lot, you can end up saving cache lines (and thus winning on running performance) when using dynamic linking relative to using static linking. (Unless modern OS's are smart enough to notice identical segments in statically linked binaries. Seems hard, anyone know?)
Another issue: loading time. You pay loading costs at some point. When you pay this cost depends on how the OS works as well as what linking you use. Maybe you'd rather put off paying it until you know you need it.
Note that static-vs-dynamic linking is traditionally not an optimization issue, because they both involve separate compilation down to object files. However, this is not required: a compiler can in principle, "compile" "static libraries" to a digested AST form initially, and "link" them by adding those ASTs to the ones generated for the main code, thus empowering global optimization. None of the systems I use do this, so I can't comment on how well it works.
The way to answer performance questions is always by testing (and use a test environment as much like the deployment environment as possible).
1) is based on the fact that calling a DLL function is always using an extra indirect jump. Today, this is usually negligible. Inside the DLL there is some more overhead on i386 CPU's, because they can't generate position independent code. On amd64, jumps can be relative to the program counter, so this is a huge improvement.
2) This is correct. With optimizations guided by profiling you can usually win about 10-15 percent performance. Now that CPU speed has reached its limits it might be worth doing it.
I would add: (3) the linker can arrange functions in a more cache efficient grouping, so that expensive cache level misses are minimised. It also might especially effect the startup time of applications (based on results i have seen with the Sun C++ compiler)
And don't forget that with DLL's no dead code elimination can be performed. Depending on the language, the DLL code might not be optimally either. Virtual functions are always virtual because the compiler doesn't know whether a client is overwriting it.
For these reasons, in case there is no real need for DLL's, then just use static compilation.
EDIT (to answer the comment, by user underscore)
Here is a good resource about the position independent code problem http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/
As explained x86 does not have them AFAIK for anything else then 15 bit jump ranges and not for unconditional jumps and calls. Thats why functions (from generators) having more then 32K have always been a problem and needed embedded trampolines.
But on popular x86 OS like Linux you just don't need to care if the SO/DLL file is not generated with the
-fpic (which enforces the use of the indirect jump tables). Because if you don't, the code is just fixed like a normal linker would relocate it. But while doing this it makes the code segment non shareable and it would need a full mapping of the code from disk into memory and touching it all before it can be used (emptying most of the caches, hitting TLB's) etc. There was a time when this was considered slow ... too slow.
So you would not have any benefit anymore.
I don't recall what OS (Solaris or FreeBSD) gave me problems with my Unix build system because I just wasn't doing this and wondered why it crashed until I applied