Data & Software Blog: tips, tricks and new features

This is the Scientific Computing group’s blog. It hosts various posts highlighting new features, bug fixes and interesting tips for users of the JCMT and EAO’s software and data.

You can subscribe to notifications of all new posts in this blog by joining EAO’s software discussion mailing list. You can join either by going to this address, or by sending an email to

You can also use this mailing list to ask questions of both EAO’s Scientific Computing staff and your fellow JCMT users. It’s currently very low volume.

Print Friendly, PDF & Email

A new dimmconfig that uses PCA

The $STARLINK_DIR/share/smurf directory includes several “dimmconfig” files that package up commonly used groups of configuration parameter values for use by the makemap command.  A new file called dimmconfig_pca.lis has recently been added, which can be combined with  other dimmconfig files to tell makemap to include a PCA model in its iterative algorithm (the PCA model removes noise signals that are correlated between multiple bolometer time-streams). For instance, to use a PCA model when creating a map of a bright extended source, you could run makemap as follows:

% more conf
% makemap config=^conf

To process compact sources, change “bright_extended” above to “bright_compact“.

Using a PCA model can help to reduce the spurious extended structures that often appear in SCUBA-2 maps (although this benefit is bought at the cost of a much extended run time). For instance, below are four 850 um maps of DR 21 – the top row shows the maps made with the basic “bright extended” dimmconfig, and the bottom shows eh results of adding in the new PCA dimmconfig:

Below are the mosaics of the four observations, with the difference map shown in between:

As another example, the following panes show similar maps for three observations of the Serpens South field:

Print Friendly, PDF & Email

Faster PCA

By default, the SMURF makemap command identifies and removes a single correlated background signal – called the common-mode – from all bolometer time-streams. However, in some cases there is clear evidence that there is more than one correlated background signal present in the bolometer time streams. This is particularly evident in POL2 data, where the varying instrumental polarisation causes different parts of the focal plane to see different levels of sky polarisation. For POL2, better maps are created if multiple correlated background signals are identified and removed . This is achieved within makemap using a process called Principal Component Analysis (PCA). The down side to using PCA is that it is very very slow – it is just about practical in the case of POL2 data because of the very low scan speed and consequent very low sample rate. 

A change introduced into SMURF on 1st April  2019 should speed up PCA by a factor of 2 or 3, making POL2 reductions quicker and maybe making PCA background removal practical for non-POL2 data. Maps created using the new PCA system will not be identical to maps made with the old system, but the differences should be well within the noise levels (the pixel variances remain largely unaffected).

To use PCA within a normal run of makemap it is recommended to add the following to your config file:

modelorder = (com,gai,pca,ext,flt,ast,noi) 
pca.pcathresh = -50

It is usually a good idea to mask out the source when calculating the PCA model on the first few iterations, since this seems to aid convergence. The same sort of mask should be used with PCA as is used with AST, but it should only be applied for a few iterations. So for instance for a point source you could add:

pca.zero_circle = 0.01667
pca.zero_niter = 2

This masks the PCA model on the first two iterations using a circle of radius 60 arc-seconds (0.01667 degrees).

Using PCA usually causes makemap to converge more slowly, but often produces maps with lower levels of artificial structures. If pca.pcathresh is negative, the absolute value indicates the number of correlated signals to remove as the background in each bolometer. Smaller numbers result in a lower level of noise reduction in the final map, but faster convergence. Larger numbers result in a higher level of noise reduction in the final map, but slower convergence. The default value is 50, which usually seems to be a reasonable compromise.

The left map below was made with a default common-mode background (no PCA)  and the centre map was made with PCA background removal as shown above. Each observation took about 6 minutes to create without PCA and about 25 minutes with the new faster PCA. The right map shows the difference between the other two maps. All three use the same scaling.

Print Friendly, PDF & Email

Combining POL2 observations with different reference positions

A bug has recently been fixed in the SMURF pol2map command  that could cause the maps created by pol2map to be blurred or show bad negative bowling around bright sources if the input observations do not all use the same reference point. So if you are using pol2map to combine data from POL2 observations with different reference positions, then you should ensure that you are using a version of starlink that includes the bug fix – i.e. was created on or after 10th January 2019. To check this, do:

% more $STARLINK_DIR/manifests/starlink.version

This will show details about your starlink build, including the date. To update it using rsync from Hilo, see these instructions.

The bug caused the pointing corrections determined at “step 1” to be in error by an amount roughly equal to the change in reference position between observations. These bad pointing corrections in turn lead to various problems (bad bowling or blurring) in the maps created at “step 2”. Consequently, it is not possible to correct an existing bad reduction – the only option is to do the whole reduction again from the start (having first ensured that your starlink installation is up to date).

Print Friendly, PDF & Email

Changes to the NDF (.sdf) Data Format

Those of you who carefully read our Starlink release notes (everyone, right?) will have noticed that our last release (2018A) included a new HDF5-based data format.

The 2018A release could read data in the new format, but by default still wrote data in the old format. In preparation for our next release we have now changed the default data format in our codebase. This means if you download the Linux* nightly build from EAO (see you will now start producing data files in the new format.

Our data files will still have the same ‘.sdf’ extension. You can tell if a file is in the old or new format most easily by using the unix command ‘file’. For an old-style .sdf file you will see ‘data’. for the new-style .sdf files you will see: ‘Hierarchical Data Format (version 5) data.

If you need to convert a new style .sdf file to the old format, you can set the environmental variable ‘HDS_VERSION’ to 4, and then run the ndfcopy command on your file. The output file will now be in the old style format. You’ll need to unset the variable to go back to writing your output in the new format.

E.g. in bash, you could do:

export HDS_VERSION=4
ndfcopy myfile.sdf myfile-hdsv4.sdf

If you need to use the old format all the time you could set HDS_VERSION=4 in your login scripts.

If you want to know more detail, you should know that the NDF data structures you use with Starlink are written to disk in a file-format known as HDS. This new file format we are now using is version 5 of the HDS file format, and is based on the widely used Hierarchical Data Format (HDF) standard. It is possible to open and read the new-style files with standard HDF5 libraries — e.g. the python H5py library. There are a few unusual features you may find, particularly if looking at the history or provenance structures. The new format is described in Tim Jenness’ paper (….12..221J/abstract). Our file format and data model are described in the HDS manual ( SUN/92) and the NDF manual (SUN/33). If you’re interested in the history of NDF and how it compares to the FITS file format/data model you could also read Jenness et al’s paper on Learning from 25 years of the extensible N-Dimensional Data Format

We haven’t changed the format that the JCMT or UKIRT write raw data in, so all raw data from both these telescopes will continue to be written in the old format. We also haven’t yet changed the format for new JCMT reduced data available through the CADC archive, but that is likely to happen at some point. We don’t intend to remove the ability for Starlink to read old files. The version of the starlink-pyhds python library available through pypi can read the new-format data files already.

If you have any questions or issues with the new file format you can contact the Starlink developers list, or if you prefer you can directly email the observatory via the  helpdesk AT email address. If you’re subscribed to the JCMT/EAO  email address you can send comments there (see for subscription links/instructions).

*Only the Linux build because we currently have an issue with our OSX nightly build and it is not updating right now– our hardware has died and I haven’t got the temporary replacement up and running yet.

Print Friendly, PDF & Email

POL-2: Beware of using ‘mapvar=yes’ at ‘step 1’

Summary: Do not use mapvar=yes on step 1 of a POL-2 reduction as it can cause negative bowling around bright sources in the I map created at step 2.

Production of polarisation maps from POL-2 data is usually performed using the pol2map command within the Starlink SMURF package, as described in the POL-2 Data Reduction cookbook – SC22. An option is available to create the final I, Q and U variances based on the spread of pixel values in a set of maps created from different individual observations. This option is selected by running pol2map with the parameter setting mapvar=yes. Without this option, output variances are calculated using simple error propagation from the variance values stored within the maps made from individual observations.

When creating POL-2 polarisation maps, it is usual first to run pol2map in ”auto-masking” mode to create an initial map of total intensity (I). This initial map is used simply to define a mask outlining the areas on the sky where bright emission is expected. This mask is then used in subsequent runs of pol2map to (amongst other things) exclude bright bolometer samples from the estimation of the COM, PCA and FLT background levels – including bright samples would bias these estimates. The first run of pol2map is usually referred to as “step 1”, with subsequent runs being “step 2” and “step 3”.  The I map created by step 1 is first converted from total intensity (in pW) into a map of signal-to-noise ratio (SNR). This is done by dividing the pixel data values by the square root of the pixel variance values. This SNR map is then thresholded at a specific value to define the mask containing the source areas.

But the size of the pixel variance values, and consequently the SNR values and the resulting mask,  depends on whether the mapvar=yes option was used when running pol2map. In general, using mapvar=yes causes the variances to be higher, meaning that the SNR values are lower. So there are fewer source pixels above the SNR threshold value. Thus, the mask is less effective at excluding bright bolometer samples from the various background estimates. This means the background is over estimated in the vicinity of bright sources, resulting in negative bowling around bright sources when the background is removed.  For obvious reasons this effect is worse for brighter sources than for weaker sources.

Below are examples of the the externally masked I map created at step 2, based on step 1 I maps created using mapvar=yes (left) and mapvar=no (right). The negative bowling in the centre of the left image is obvious (images provided by Kate Pattle).









Print Friendly, PDF & Email