Setting up the Apple M1 for Native Code Development from the Command Line

Finding the right PATH

Feb 12, 2021

Introduction

This is the first of a few posts on my experiences using a Mac mini/M1 for software development. Since I am an oldie, I am doing this from the command line. If you are using a GUI that may make things easier, or, (at least to me), more confusing!

This post covers the apparently simple task of having your environment set up in a way that targets the M1 sanely.

Later posts will cover some “gotchas” in the Arm and emulated x86_64 environments.

The M1 Environment

The Apple M1 is an Apple implementation of an Arm 64-bit (AArch64) architecture processor. To allow machines built with this processor to run code which was built for the x86_64 architecture, MacOS (in the “Big Sur” and later releases) supplies an x86_64 binary translation emulator (“Rosetta-2”). The presence of this emulator means that x86_64 native executables can run on this machine without any change. That is a good thing since it means we have a useful environment without changing or recompiling any code, but a bad one in terms of potential confusion.

To allow a single executable image (or runtime library) to run natively on either architecture, the OS (and associated tools) support “universal-binaries”; thus a single file can contain both the code required for the AArch64 architecture and also that required for the x86_64 one. In general, a “fat binary” like this could support many more architectures, but the case of interest here is just these two.

Apple utilities, such as the compilers that come with the XCode command line tools, are distributed in this way, so a single executable image found in your $PATH may contain both versions of the tool.

Aside from that (which we’ll see below can be confusing!) the environment is the normal, MacOS “Big Sur” one.

What Works?

At first glance, everything. The installation of Aquamacs runs, bash, python, make and cmake are all there, so things look great.

But…

Things get confusing if you run a shell from inside emacs (as I do, since it provides a stable development environment on all of your machines, accessible without needing to forward a GUI, and my muscle-memory knows the editing keystrokes).

Let’s check the compiler we see in that environment :-

$ which clang 
/Library/Developer/CommandLineTools/usr/bin/clang
$ clang -v
Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: x86_64-apple-darwin20.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Hmm, we’re getting the compiler from the XCode tools, but it’s compiling for x86_64, not the AArch64 that we wanted.

What Is Going On?

The arch command gives us a little hint. If we run it, it tells us the environment in which it is running :-

$ arch
i386

So our shell (executing inside emacs) is running in emulation. The “i386” is somewhat misleading; it does mean x86_64! 1

If we do the same thing in a shell inside a terminal window started from Finder, we see this:-

$ arch
arm64

So, here our shell is running natively.

But, Why Does This Matter?

To make things work more easily for the x86_64 emulated environment, the standard behaviour of MacOS is to maintain the existing process’ architecture at an exec call if possible. Thus, if our shell is running in x86_64 emulation, when we start the compiler (which is a universal binary, remember), the embedded x86_64 executable will be executed. And, that version of the compiler defaults to compiling for the x86_64 target.

So… because we used emacs (which is not yet native), we end up with an x86_64 compiler.

Totally obvious, right!?

But, before you blame emacs, consider that other tools may also not yet be available in AArch64 native form, and that even when they are you need to set up your environment to ensure that you find the right one. Consider cmake, for instance; if you are using brew and haven’t installed the AArch64 native version and changed your paths, you’ll be getting the cmake from /usr/local/bin which is an x86_64 image. It will therefore see that as the default target environment and likely configure the build it is setting up to use that architecture.

What Can We Do?

There are a number of things that we can do that help.

Ensure that our shell always executes as a native, AArch64 shell

We can achieve that by checking the architecture in which the shell is running in our shell startup script (.zshenv or .bashrc, or whatever is appropriate for your shell), and then exec-ing another shell inside an arch command to switch to the AArch64 environment.

Something like this from my .bashrc :-

# Switch to an arm64e shell by default
if [ `machine` != arm64e ]; then
    echo 'Execing arm64 shell'
    exec arch -arm64 bash
fi

Once that is in place, the shell executed inside emacs will be running the AArch64 environment even though emacs isn’t. So now the compiler we get will be the one for the real machine.

Ensure that our PATH points to AArch64 images

This is important once we install other tools with brew. If we use it to install compilers (for instance if we want more cutting edge LLVM compilers, or support for OpenMP which is not fully enabled in the Apple compilers), then we may have three different versions of the clang command:

A version in the Apple command line tools installed somewhere like /Library/Developer/CommandLineTools/usr/bin/. This compiler is a universal-binary which will choose its default target based on the properties of the process from which it was invoked.
A version in the x86_64 brew environment installed by default somewhere like /usr/local/Cellar/llvm/11.0.0_1/bin/. This compiler is an x86_64 binary targeting x86_64.
A version in the AArch64 brew environment installed by default somewhere like /opt/homebrew/Cellar/llvm/11.0.1/bin/. This compiler is an aarch64 executable targeting aarch64.

If you copied your existing environment from another, older, MacOS machine, then it will certainly not be pointing at the AArch64 brew environment. That older, x86_64, installation will all continue to work, but you won’t be exploiting your new machine to its full extent.

Note too, that if the brew (or other installation systems like anaconda) directories are searched before the XCode ones, you may be running x86_64 versions of tools which are available as universal binaries (so can run natively) in Xcode. For instance, XCode provides python3 as a universal binary, while anaconda may not yet be doing that.

$ lipo -info \ /Library/Developer/CommandLineTools/usr/bin/python3
Architectures in the fat file: /Library/Developer/CommandLineTools/usr/bin/python3 are: x86_64 arm64 
$ /Library/Developer/CommandLineTools/usr/bin/python3 --version
Python 3.8.2

Useful Commands

I’m not going to show the man pages for these (and my low Google-fu failed to find good MacOS man pages online, so I won’t give you any links). However, it’s worth knowing about these commands, so that you can then run man on them yourself locally.

When you’re trying to understand what went wrong, these can be useful!

`arch`

The arch command can be used both to see what the current default execution environment is, and to invoke a command for a specific architecture.

$ arch # Show the architecture
i386
$ arch -arch arm64 machine # Run the machine command in arm64
arm64e
$ arch -arch x86_64 machine # Run the machine command in x86_64  
i486

You can also set a default machine preference for the arch command using the ARCHPREFERENCE envirable. If an invocation of arch which is being used to invoke another command and there is no architecture being explicitly requested, then the one in $ARCHPREFERENCE will be used.

$ arch
arm64
$ ARCHPREFERENCE=x86_64 arch   
arm64
$ ARCHPREFERENCE=x86_64 arch machine
i486

Note, though, that this is not changing any global default, merely affecting what the arch command does by default. So execution which isn’t mediated by arch is not affected.

`machine`

The machine command simply prints out the architecture on which it is running. It gives a slightly saner version than arch does! (Though it still thinks x86_64 is i486, as we can see above). Note, though, that although this returns arm64e on the M1, the target architecture that one should normally use is arm64.

`brew`

Homebrew now has support for the Aarch64 architecture and will install useful packages for software development. However you will need to change your PATH to use them, since it does not install them in /usr/local by default (which is retained for the x86_64 binaries) but rather in /opt/homebrew. See the Homebrew 3.0.0 release notes.

file

As you no doubt know already, file is the command one normally uses to see what the semantic properties of a file are. In the MacOS environment, where universal binaries exist, it can tell us about them in a slightly more verbose manner that lipo’s -info option.

$ file `which bash` 
/bin/bash: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e] /bin/bash (for architecture x86_64):	Mach-O 64-bit executable x86_64 /bin/bash (for architecture arm64e):	Mach-O 64-bit executable arm64e

`lipo`

lipo is the command line tool which is used to manipulate universal binaries. It can show information about the contents, e.g.

$ lipo -info `which bash`
Architectures in the fat file: /bin/bash are: x86_64 arm64e

However, in addition, it can be used to extract architecture specific of the file, or to build a universal binary from existing, single architecture, binaries. Thus, if you wanted to build your own universal binary you will probably need lipo.

`xcode-select`

The xcode-select command can be used to install the XCode command line tools

$ xcode-select --install

or show where the XCode command line tools are installed.

$ xcode-select -p
/Applications/Xcode.app/Contents/Developer

However, the man page suggests that one should use xcrun to find tools inside a script.

`xcrun`

xcrun also provides information about the XCode environment. It is the recommended way to find XCode tools when configuring other things, as I do here in my script to configure LLVM from inside build directory inside the LLVM root directory.

# Xcode, Ninja, Make as you prefer.
BUILD_SYSTEM=Ninja
BUILD_TAG=`echo $BUILD_SYSTEM | tr [A-Z] [a-z]`
INSTALLDIR=${HOME}/software/clang-12.0.0/arm64
# I don't see how to find this programmatically :-(
XCODE_ROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk

cmake ../llvm \
      -G${BUILD_SYSTEM} -B ${BUILD_TAG}_build \
      -DCMAKE_OSX_ARCHITECTURES='arm64' \
      -DCMAKE_C_COMPILER="$(xcrun --find clang)" \
      -DCMAKE_CXX_COMPILER="$(xcrun --find clang++)" \
      -DCMAKE_BUILD_TYPE=Debug \
      -DCMAKE_INSTALL_PREFIX=$INSTALLDIR \
      -DLLVM_LOCAL_RPATH=$INSTALLDIR/lib \
      -DLLVM_ENABLE_WERROR=FALSE \
      -DLLVM_TARGETS_TO_BUILD='AArch64' \
      -DLLVM_DEFAULT_TARGET_TRIPLE='aarch64-apple-darwin20.3.0' \
      -DDEFAULT_SYSROOT=${XCODE_ROOT} \
      -DLLVM_ENABLE_PROJECTS='clang;openmp;polly;clang-tools-extra;libcxx;libcxxabi'

What Did We Just Learn?

This environment can be complicated and hard to grok.
There are tools which can help, some of which are MacOS specific, so you may not even know they exist if you’re used to a Linux environment.
There are some horrible hacks (such as re-execing a shell) which can help too.
You will need to modify your environment (and in particular the directories searched in your PATH) to make things work well.
You can get this all to work.
It has got much easier now that brew has AArch64 MacOS support.

What’s Coming Next

In the next blog I’ll cover some architectural and ABI properties of the M1/MacOS combination which may bite you if you don’t know about them!

The arch man page is amusing; it references the arm64 target (whose architecture was announced in October 2011) while simultaneously claiming that the man page dates from July 8 2010!

Oct 31, 2021

Regarding

# I don't see how to find this programmatically :-(

XCODE_ROOT=

You can get this value with `xcrun --show-sdk-path`, optionally including `--sdk macosx`

Expand full comment

Mike Herman

Dec 11, 2023

Great! Now I understand why my predominately Fortran code with some small C additions gets fortran-part in arm64 architecture and c-part in x86_6. Without your hint I would never figure it out.

3 more comments...

CPU fun

Discussion about this post