Information

Difference between reverse inference and decoding (e.g. MVPA) in fMRI

Difference between reverse inference and decoding (e.g. MVPA) in fMRI



We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Reverse inference, i.e. the use of brain activation (fMRI) data to infer the engagement of a certain mental function, is heavily criticised (e.g., Poldrack 2011, Neuron).

At the same time, decoding, the process of applying machine learning algorithms to such brain activation patterns in order to infer something about the mental function that elicited this response (e.g. to distinguish between two different conditions in an fMRI experiment), is extensively used and actively developed - multivariate pattern analysis (MVPA) being just the most widespread of these techniques.

The two concepts appear, however, to be one and the same; or rather, that the second (decoding) is a special case of the first (reverse inference). Is there an additional nuance to their difference? Also, why is decoding not criticised in the same way as reverse inference?


Short answer: Decoding is not a special case of reverse inference.

The difficulty with interpreting neuroimaging results is that there is a tremendous amount of variability (noise) in the data. For example, say we attempt to determine the brain areas associated with the emotion of romantic love by showing subjects images of close friends (condition 1), or images of their loved ones (condition 2), and comparing the results. Each brain scan may show 5-10 active regions, which regions are active and to what degree varies between subjects even in the same condition, and there is even variability in brain scans of the same subject across multiple trials.

To deal with this variability, the first step in just about any neuroimaging experiment's data interpretation process is a statistical analysis. This can range from an "averaging" or "noise-cancellation" analysis, to a multi-voxel / multi-frame machine-learning pattern-matching classifier (MVPA). The data analysis is used to determine a predictor of the independent variable, and also to calculate significance (p-value) based on the consistency level in the data. For example, we may learn that subjects viewing images of their loved ones (condition 2) have 5 areas of their brain reduce activity and 4 areas increase activity relative to subjects viewing images of other close relations (condition 1). Note that the decoding step does not involve "labelling" the pattern as any particular mental state - a classifier like MVPA does not care what the patterns mean, it's just a mathematical function used to distinguish between them.

In a typical cognitive neuroscience experiment, the next step is to attach a label to the mental state defined by the pattern found - this is forward inference. To do this, researchers typically determine the mental state through some other validated measure - for example, a questionnaire or some other test that is known to reliably measure love. The researchers may conclude: "These regions of the brain are associated with the emotion of romantic love when subjects view images of their loved ones." Immediately, a constraining factor arises: This pattern of activation is relative to the control condition only!

While forward inference involves labelling the pattern from the independent variable, reverse inference is used to label the independent variable from an activity pattern. Reverse inference gets into trouble when it is used to label a particular pattern of activation in a different context. For example, "the same areas of the brain associated with the emotion of love in an experiment involving showing subjects images of their loved ones, also show the same pattern of activation when subjects view their iPhones, and therefore we conclude that subjects are in love with their iPhones." Note that statistical analysis of the data (decoding) must take place first in order to be able to make either kind of cognitive inference.

Reverse inferences suffer from several potential pitfalls, among them:

  • A particular pattern of activity usually indicates different mental states in different contexts, so assuming that the same pattern indicates "love" when applied to iPhones is invalid.
  • Patterns isolated using a statistical analysis usually do not fully describe the mental state, so assuming that the pattern is sufficient to indicate "love" is also invalid.
  • Mental state labels themselves are context dependent, subject-dependent, and even temporally variable, so assuming that subjects would interpret their own mental state as "love" is invalid as well.
  • Mental states are not discrete, as evidenced by all the "noise" in neuroimaging data, so subjects may identify their mental state using a variety of different labels depending on what part of their mental state they attend to, "love" being only one.

Reverse inference is subject to these issues no matter what statistical analysis is used, MVPA is not immune. However, many of these issues can be overcome using a variety of techniques. MVPA is very valuable for dealing with the first issue - thanks to much higher "resolution" (level of detail, scope of data, linearity, etc), this decoding technique is significantly less likely to confuse different mental states that appear to have the same activity pattern at a lower resolution. Another important strategy is using meta-analysis (data gathered from many different studies) to determine how useful a particular pattern is for reliably inferring a mental state. Solving the remaining problems will likely require per-subject training data, and even more advanced analysis techniques, but already reverse inference has been a useful methodology in many studies, when carefully applied.

An additional interesting side-effect of machine-learning pattern analyzers is that they tend to remove the temptation to make unwarranted reverse inferences. In the early days, when patterns of activation were "informal" - based on gross anatomy - it was too easy to make sweeping statements such as: "… the insular cortex of the brain, which is associated with feelings of love… " With MVPA, patterns of activation are practically impossible to describe in human language, so machines are required to make inferences, which are then inherently more objective.


Note. I initially scan read the question, I have rewritten my answer as a consequence, and due to the comments given.

As highlighted by others here Multi-voxel pattern analysis (MVPA) is an application of machine learning, used for decoding vast quantities of complex information (neural activation patterns to particular asks). This is a form of decoding may be used to infer a cognition, otherwise known as reverse inference.

The problem of reverse inference is largely summed up by the following comment from Poldrak's (2011)…

The use of reasoning from activation to mental functions, known as “reverse inference”, has been previously criticized on the basis that it does not take into account how selectively the area is activated by the mental process in question.

Poldrack (2011) goes on to explain that informal reverse inference, which is based on a researchers knowledge, is flawed because an individuals knowledge is limited by what the remember and have read. Additionally poor interpretations are compounded from one researcher to another.

The problem with reverse inference comes not from understanding general cognitive processes, such as sight, movement, language, decision-making etc. General patterns for general processes have been established, the problem as Poldrack (2011) points out is when we interpret patterns to more well defined cognitions, for example instead of looking at merely reward processing, we might want to compare patterns of pleasure derived from seeing highly palatable e.g. cake, and less palatable food e.g. fruit. At this level, comparing the data requires far more specific analytical approach. If we were a researcher making an inference based on our knowledge, we would have a strong chance of error.

MVPA deals with a far higher resolution of data than an individual could deal with, and compares data to prior trials or previous experiments, see Fig 1. However it is critical to remember that we are comparing participants within similar contexts.

Fig 1. MVPA diagram of testing and inference Norman et al (2006)

Poldrack (2011) gives an excellent example of how to use MVPA, which was conducted by Kay et al (2008). Simply Kay et al (2008) scanned participants viewing natural images n= 1750. In the following trial 120 images were added, MVPA of the neural data was able to accurately predict which images were being viewed. This method has seen substantial development to the point that using a similar method researchers have accurately analysed neural patterns observed in a participants waking brain into decoding what participants are dreaming. Basically the quantity and quality of when taking account of context may be used to conduct reverse inference, but this does require previous data, within context, for the MPVA and other machine learning methods to compare too.

This does not mean that machine learning can not be wrong if incorrectly applied. It is a statistical method in which humans set the parameters, the post below by Arnon Weinberg accurately defines the issues and pot holes that must be avoided to make this method viable for reverse inference.

Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424-430. http://doi.org/10.1016/j.tics.2006.07.005


The authors declare that they have no conflicts of interest.

MVPA decoding methods could successfully distinguish between conditions because of differences in the multivariate means of response patterns between conditions or because of differences in the variability of those patterns across trials. Perhaps counterintuitively, the latter could constitute signal rather than noise (e.g. if this information is read out and used by other brain areas). However, differences in response variability between conditions could also easily arise from extraneous factors of the sort described in this section (e.g. differences between conditions in how equally trials are distributed across runs). Thus, it is critically important to minimize factors that could lead to differences in both the mean and variability of response patterns between conditions. See Hebart and Baker (2018) for an in-depth discussion of this issue.

Note that if the trial order is not fully randomized, response patterns should only be compared across runs for both decoding and similarity analyses see Mumford et al. (2014) for further discussion of this issue.


METHODS

Participants

Experimental procedures were approved by the University of New South Wales human research ethics committee (HREC No. HC12030). All methods in this study were performed in accordance with the guidelines and regulations from the Australian National Statement on Ethical Conduct in Human Research (www.nhmrc.gov.au/guidelines-publications/e72). All participants gave informed written consent to participate in the experiment. We tested 15 participants (four women) aged 29.6 ± 1.4 (mean ± SEM) years. For analyses discriminating successful from failed suppression trials, a subset of eight participants (two women, aged 32 ± 2.1 years) was considered using a selection criterion of having at least 25% of failed or successful suppression trials. In other words, participants with failed/successful trial ratios more skewed than 1:3 were discarded.

We selected the sample size based on previous studies (Koenig-Robert & Pearson, 2019 Aso et al., 2016 Soon, Brass, Heinze, & Haynes, 2008) to meet standard criteria of statistical power. We conducted a post hoc power analyses to ascertain the power achieved using G*Power (Faul, Erdfelder, Buchner, & Lang, 2009). For GLM analyses, the power achieved was at least 0.91 at the voxel level (n = 15). For the decoding analysis discriminating the content of successful suppression, we achieved a power of at least 0.79 to detect differences in the paired t test for relevant conditions (n = 8).

Functional and Structural MRI Parameters

Scans were performed at the Neuroscience Research Australia facility, Sydney, Australia, in a Philips 3-T Achieva TX MRI scanner using a 32-channel head coil. Structural images were acquired using turbo field echo sequence consisting of 256 T1-weighted sagittal slices covering the whole brain (flip angle = 8°, matrix size = 256 × 256, voxel size = 1 mm isotropic). Functional T2*-weighted images were acquired using EPI sequence with 31 slices (flip angle = 90°, matrix size = 240 × 240, voxel size = 3 mm isotropic, repetition time = 2000 msec, echo time = 40 msec).

Suppression/Imagery Task

We adapted the behavioral task from a previous study from our group (Kwok et al., 2019) to satisfy fMRI requirements. We instructed participants to either imagine or suppress (avoid imagining) the visual thought of either a red apple or a green broccoli (see Figure 1A). Each trial started with a written object cue reading “green broccoli” or “red apple” for 2 sec. After this, a task cue was shown, reading either “imagine” or “suppress” for 2 sec. Participants were instructed to either visualize (imagery period) as vividly as they could the cued object in the imagine condition or avoid thinking about the cued object (suppression period) in the suppress condition for 12 sec. Importantly, we extensively instructed participants not to use object substitution (imagining another item to avoid imagining the cued object) as we did in our previous behavioral study (Kwok et al., 2019), where we tested both suppression and substitution strategies. We previously observed that object substitution led to very different outcomes compared to object suppression (i.e., substitution led to better thought control). A fixation point was shown on the screen, and participants were required to fixate. In the suppression condition, participants were instructed to press a button as soon as they detected that the visual thought of the to-be-suppressed object appeared in their minds. We labeled such events as “suppression breaks,” and the trial was labeled as a failed suppression trial. The suppression break button could be pressed multiple times within the 12 sec (Supplementary Figure S4 1 ), thus representing multiple suppression breaks. After the imagery/suppression period, a prompt asking to rate vividness (from 1 to 4, with 4 = strongest vividness) was presented after imagery or failed suppressed trials. Participants responded by pressing one of the four buttons on two response boxes. No vividness question was shown after successful suppression trials, which were automatically labeled as having vividness = 0. In failed suppressed trials, whenever multiple suppression breaks were reported, we instructed participants to rate the highest vividness suppression break event. After reporting the vividness of the visual thought (if required), an intertrial interval of 10 sec was observed during which the word “rest” appeared on the screen. In each run of 5 min, three trials of each type (imagine/suppress apple/broccoli) were tested, yielding 12 trials. Trials were pseudorandomized within a run.

Imagery/suppression fMRI task. (A) Imagery/visual thought suppression. Every trial started by a written cue indicating the object to be imagined or suppressed (either a green broccoli or a red apple) for a duration of 2 sec. After this, the task instruction was presented—“Imagine” or “Suppress”—for 2 sec. The fixation point remained on the screen for 12 sec during which the participants tried to either visualize the cued object as vividly as possible or suppress the visual thought of it. In suppression trials, participants pressed a designated button (same button irrespective of the object to be suppressed) to report a suppression break event, that is, when the mental image of the object to be suppressed appeared in their minds. In imagery trials and suppression trials with suppression breaks, participants were asked to report the subjective intensity of the visual thought experienced in a vividness scale from 1 = low to 4 = high. In suppression trials with no suppression breaks, the vividness prompt was not shown and vividness for that trial was assigned to 0. After every trial, an intertrial interval of 10 sec was observed a fixation point and the word “rest” were displayed on the screen. (B) Vividness rating in suppression trials for each participant. Suppression vividness from 0 = suppression success to 4 = highly vivid suppression break as the percentage of trials for every participant. Participants had a wide range of suppression break ratios. For analyses comparing suppression success and failure, only participants having at least 25% of suppression breaks or successful suppression were considered, which corresponded to eight participants (marked with a ★). (C) Vividness rating in imagery trials for each participant. Unlike the vividness ratings for suppression trials, vividness ratings in the imagery trials were more homogeneous across participants. This suggests that the differences across participants in vividness ratings in the suppression conditions correspond to interindividual differences in thought control (Kwok et al., 2019) rather than inconsistencies in the vividness report.

Imagery/suppression fMRI task. (A) Imagery/visual thought suppression. Every trial started by a written cue indicating the object to be imagined or suppressed (either a green broccoli or a red apple) for a duration of 2 sec. After this, the task instruction was presented—“Imagine” or “Suppress”—for 2 sec. The fixation point remained on the screen for 12 sec during which the participants tried to either visualize the cued object as vividly as possible or suppress the visual thought of it. In suppression trials, participants pressed a designated button (same button irrespective of the object to be suppressed) to report a suppression break event, that is, when the mental image of the object to be suppressed appeared in their minds. In imagery trials and suppression trials with suppression breaks, participants were asked to report the subjective intensity of the visual thought experienced in a vividness scale from 1 = low to 4 = high. In suppression trials with no suppression breaks, the vividness prompt was not shown and vividness for that trial was assigned to 0. After every trial, an intertrial interval of 10 sec was observed a fixation point and the word “rest” were displayed on the screen. (B) Vividness rating in suppression trials for each participant. Suppression vividness from 0 = suppression success to 4 = highly vivid suppression break as the percentage of trials for every participant. Participants had a wide range of suppression break ratios. For analyses comparing suppression success and failure, only participants having at least 25% of suppression breaks or successful suppression were considered, which corresponded to eight participants (marked with a ★). (C) Vividness rating in imagery trials for each participant. Unlike the vividness ratings for suppression trials, vividness ratings in the imagery trials were more homogeneous across participants. This suggests that the differences across participants in vividness ratings in the suppression conditions correspond to interindividual differences in thought control (Kwok et al., 2019) rather than inconsistencies in the vividness report.

Perception Task

We presented flickering natural images of a broccoli and an apple against a black background at 4.167 Hz at three different perceptual intensities (40%, 60%, and 80% transparency) to maximize subsequent classifier generalization ability (Bannert & Bartels, 2013). Natural images of a broccoli and an apple against a black background were retrieved on Google image search for images labeled for reuse with modification and were presented inside a rectangle (the same that was used in the imagery/suppression task Figure 1) including a fixation point at the center. Within a run of 3 min, we presented the flickering images in a block manner, interleaved with fixation periods of 15 sec each (apple: 15 sec, rest: 15 sec, broccoli: 15 sec, rest: 15 sec, etc.). Importantly, an attention task was performed consisting of detecting a change in fixation point brightness (+70% for 200 msec). Fixation changes were allocated randomly during a run, from one to four instances. Participants were instructed to press any of the four buttons as soon as they detected the changes. Participants showed high performance in the detection task (d′ = 2.89 ± 0.15 SEM).

Functional Mapping of Retinotopic Visual Areas

To functionally determine the boundaries of visual areas from V1 to V4 independently for each participant, we used the phase-encoding method (Warnking et al., 2002 Sereno et al., 1995). Double wedges containing dynamic colored patterns cycled through 10 rotations in 10 min (retinotopic stimulation frequency = 0.033 Hz). To ensure deployment of attention to the stimulus during the mapping, participants performed a detection task: pressing a button upon seeing a gray dot anywhere on the wedges.

Experimental Procedures

We performed the three experiments in a single scanning session lasting about 1.5 hr. Stimuli were delivered using an 18-in. MRI-compatible LCD screen (Philips ERD-2, 60-Hz refresh rate) located at the end of the bore. Participants held one dual-button response box in each hand (Lumina, Cedrus) that was used to record all responses. All stimuli were delivered, and responses were gathered employing the Psychtoolbox 3 (Brainard, 1997 Pelli, 1997) for MATLAB (The MathWorks Inc.) using in-house scripts. Participants' heads were restrained using foam pads and adhesive tape. Each session followed the same structure: first, the structural scanning followed by the retinotopic mapping (10 min). Then, the perception task was alternated with the imagery/suppression task until completing three runs of the perception task (3 min per run). Then, the imagery/suppression task was repeated until completing eight runs in total (5 min per run). Pauses were assigned in between the runs. The first four volumes of each functional runs were discarded to account for the equilibrium magnetization time, and each functional run started with 10 sec of fixation.

We interleaved perception and imagery/suppression blocks to remove order effects and to increase participants' engagement. For the same reason, we opted against organizing suppression and imagery trials in different blocks. Pseudorandomizing imagery and imagery trials within a block ensured that participants would not ignore the cue in suppression trials, because the task is only revealed after the cue. In addition, by randomizing the order of suppression and imagery trials, we avoided putting suppression trials together, which revealed to be exhausting based on postexperiment interviews from our previous study.

Visual ROI Functional Definition

fMRI retinotopic mapping data were analyzed using the fast Fourier transform (FFT) in MATLAB. The FFT was applied voxel-wise across time points. The complex output of the FFT contained both the amplitude and phase information of sinusoidal components of the BOLD signal. Phase information at the frequency of stimulation (0.033 Hz) was then extracted, using its amplitude as threshold (≥ 2 signal-to-noise ratio), and overlaid them on each participant's cortical surface reconstruction obtained using FreeSurfer (Fischl et al., 2004 Fischl, Sereno, & Dale, 1999). We manually delineated boundaries between retinotopic areas on the flattened surface around the occipital pole by identifying voxels showing phase reversals in the polar angle map, representing the horizontal and vertical visual meridians. In all participants, we clearly defined four distinct visual areas labeled V1, V2, V3 (specifically its ventral part known as VP), and V4. All four retinotopic labels were then defined as the intersection with the perceptual blocks (broccoli/apple > fixation, p < .001, false discovery rate [FDR] corrected), thus restricting the ROI to the foveal and parafoveal (∼5.5° of visual angle) representations of each visual area.

Suppressed Object Information Containing ROI Definition

We used a decoding approach to define ROIs bearing information about the content (apple vs. broccoli) of suppressed trials. We used these ROIs to test whether similar representational content was shared between imagery and successful suppression (Figure 4) and between perception and successful suppression (Supplementary Figure S5). To define these ROIs, regressors for apple and broccoli were extracted from every run from the suppression trials (independent of success) using 12-sec boxcars (locked to the beginning of the suppression period). We used a leave-one-run-out cross-validation scheme (see MVPA section for details) and a searchlight approach (3 voxels of radius). ROIs containing information about the contents of suppression were defined as those reaching a classification accuracy of Z > 2 (one-sample t test against chance: 50%) at the voxel level. We then corrected for multiple comparisons using cluster-extent based thresholding employing Gaussian random field theory at p < .05. Only two ROIs satisfied these statistical criteria: one in the inferior frontal gyrus (IFG) and one in the LOC (Figure 4). Importantly, this ROI definition is orthogonal to the target cross-decoding analysis where we tested the mutual informational content between imagery and successful suppression, as the training sets of both analyses are independent from each other: suppression trials for the ROI definition and imagery trials for the cross-decoding analysis.

Lateralization Analysis

Thus, LI = −1 represents fully left-lateralized effects whereas LI = +1, fully right-lateralized effects.

FMRI Signal Preprocessing

All data were analyzed using SPM12 (Wellcome Trust Centre for Neuroimaging). We realigned functional images to the first functional volume and high-pass filtered (128 sec) to remove low-frequency drifts in the signal.

Imagery versus Suppression GLM Analysis

Data were spatially normalized into the Montreal Neurological Institute template and spatially smoothed using an FWHM 8-mm 3-D Gaussian kernel. We generated regressors for each condition (imagery and suppression, independent of the imagined/suppressed object) for each run independently. We used boxcar functions of 12 sec, time-locked to the beginning of the imagery or suppression periods, to model each trial with the canonical hemodynamic response function as basis function. Vividness of the imagery and suppression trials was modeled using parametric modulators as it has been shown that brain activations are dependent on the vividness of imagery (Dijkstra, Bosch, & van Gerven, 2017). GLMs were used to test differences between imagery and suppression conditions. Participants' (n = 15) estimates (betas) of the mass-univariate GLM were fed into a second-level two-sample t-test analysis.

Successful versus Failed Suppression GLM Analysis

The analysis was performed as described in the previous paragraph except for the following differences. We generated regressors for successful and failed suppression (independent of the suppressed object) for each run independently. We used boxcar functions of 12 sec (locked to the beginning of the suppression period) and 1 sec (time-locked to the suppression break events) to model successful suppression and suppression breaks, respectively, to capture their respective sustained and transient natures (Mitchell et al., 2007). Only participants having at least 25% of successful or failed suppression trials (n = 8) were considered to have enough data to estimate the parameters.

GLM Analysis for MVPA

Data were analyzed in their native space, without spatial normalization and smoothing to avoid disrupting information contained in the spatial patterns of activation (Hebart, Görgen, & Haynes, 2015). For the task decoding (Figures 2C and 3B), we estimated GLM for imagery versus suppression and successful versus failed suppression as described above. For the content decoding (Figure 4), regressors for apple and broccoli were estimated using boxcar functions (15 sec for the perception trials and 12 sec for the imagery and successful suppression conditions). This analysis was performed on the subset of participants having at least 25% of successful or failed suppression trials (n = 8).

Imagery and suppression engage two different networks. (A) Imagery > Suppression contrast. Imagery activations (red) were found in high hierarchy visual areas (InfTG) bilaterally. Left-lateralized imagery-driven activations were also found on executive areas (InfFS and SupFS) and attention-related areas (IPS). Suppression (blue), on the other hand, was associated with right-lateralized activations in executive (ACC, SupFG, MidFS), high visual (STS), and multimodal (AngG) areas. All activations are at p < .001 (voxel level) and p < .05 cluster-level correction (Gaussian random field theory) for multiple comparisons. (B) LI for the Imagery > Suppression contrast. LI as the absolute value of the significant activations across hemispheres (see Methods section for details). Imagery activations were predominantly left lateralized, mean = −0.35, two-tailed t test, t(14) = 3.57, p = .003, 95% CI [0.14, 0.57] (uncorrected), consistent with previous reports. Suppression-related activations were, on the other hand, predominantly right lateralized, mean = 0.35, two-tailed t test, t(14) = −2.61, p = .02, 95% CI [−0.64, −0.06], uncorrected. LIs for imagery and suppression were significantly different, two-tailed t test, t(14) = 4.1, p = .001, 95% CI [0.34, 1.08]. (C) Task decoding. Visual ROIs contained useful information to reliably classify (above 80% accuracy) imagery from suppression trials, thus indicating that these conditions engage visual areas differently. V1: 88.36%, one-tailed t test, t(13) = 14.67, p = 10 −6 , 95% CI [83.73, inf] V2: 90.04%, t(13) = 11.38, p = 10 −6 , 95% CI [83.8, inf] V3: 88.66%, t(13) = 13.59, p = 10 −6 , 95% CI [83.62, inf] V4: 80.95%, t(13) = 9.7, p = 10 −6 , 95% CI [75.16, inf] all p values FDR corrected, q = 0.05. Error bars correspond to +1 SEM. AngG = angular gyrus InfFS = inferior frontal sulcus InfTG = inferior temporal gyrus IPS = intraparietal sulcus MidFS = middle frontal sulcus PosCS = postcentral sulcus SupFG = superior frontal gyrus SupFS = superior frontal sulcus.

Imagery and suppression engage two different networks. (A) Imagery > Suppression contrast. Imagery activations (red) were found in high hierarchy visual areas (InfTG) bilaterally. Left-lateralized imagery-driven activations were also found on executive areas (InfFS and SupFS) and attention-related areas (IPS). Suppression (blue), on the other hand, was associated with right-lateralized activations in executive (ACC, SupFG, MidFS), high visual (STS), and multimodal (AngG) areas. All activations are at p < .001 (voxel level) and p < .05 cluster-level correction (Gaussian random field theory) for multiple comparisons. (B) LI for the Imagery > Suppression contrast. LI as the absolute value of the significant activations across hemispheres (see Methods section for details). Imagery activations were predominantly left lateralized, mean = −0.35, two-tailed t test, t(14) = 3.57, p = .003, 95% CI [0.14, 0.57] (uncorrected), consistent with previous reports. Suppression-related activations were, on the other hand, predominantly right lateralized, mean = 0.35, two-tailed t test, t(14) = −2.61, p = .02, 95% CI [−0.64, −0.06], uncorrected. LIs for imagery and suppression were significantly different, two-tailed t test, t(14) = 4.1, p = .001, 95% CI [0.34, 1.08]. (C) Task decoding. Visual ROIs contained useful information to reliably classify (above 80% accuracy) imagery from suppression trials, thus indicating that these conditions engage visual areas differently. V1: 88.36%, one-tailed t test, t(13) = 14.67, p = 10 −6 , 95% CI [83.73, inf] V2: 90.04%, t(13) = 11.38, p = 10 −6 , 95% CI [83.8, inf] V3: 88.66%, t(13) = 13.59, p = 10 −6 , 95% CI [83.62, inf] V4: 80.95%, t(13) = 9.7, p = 10 −6 , 95% CI [75.16, inf] all p values FDR corrected, q = 0.05. Error bars correspond to +1 SEM. AngG = angular gyrus InfFS = inferior frontal sulcus InfTG = inferior temporal gyrus IPS = intraparietal sulcus MidFS = middle frontal sulcus PosCS = postcentral sulcus SupFG = superior frontal gyrus SupFS = superior frontal sulcus.

Failed suppression is correlated with activations in visual and memory areas. (A) Failed > Successful suppression contrast. Failed suppression was associated with posterior activations along the visual stream in areas such as the MidOC and MedOTS and in memory-related areas such as the ParHC and the hippocampus (not shown). On the other hand, successful suppression was associated with anterior activations in executive areas such as the MidFS, the Opc, and ACC. These results indicate that control over suppressed thoughts obeys an engagement of executive control areas, whereas failure at suppressing thoughts is accompanied by a hyperactivity of visual and memory-related areas. All results are p < .001 (voxel level) and p < .05 cluster-level correction (Gaussian random field theory) for multiple comparisons. (B) Task decoding. Visual ROIs contained useful information to classifying failed from successful suppression trials. V1: 68.95%, one-tailed t test, t(7) = 6.34, p = 2.66 · 10 −4 , 95% CI [63.28, inf] V2: 71.38%, t(7) = 9.08, p = 4.13 · 10 −5 , 95% CI [66.91, inf] V3: 80.5%, t(7) = 11.3, p = 2.06 · 10 −5 , 95% CI [75.39, inf] V4: 65.87%, t(7) = 4.02, p = .003, 95% CI [58.3886, inf] all p values FDR corrected, q = 0.05. Error bars correspond to +1 SEM. Ins = insula MidOC = middle occipital cortex MedOTS = medial occipito-temporal sulcus Opc = operculum ParHC = parahippocampal gyrus.

Failed suppression is correlated with activations in visual and memory areas. (A) Failed > Successful suppression contrast. Failed suppression was associated with posterior activations along the visual stream in areas such as the MidOC and MedOTS and in memory-related areas such as the ParHC and the hippocampus (not shown). On the other hand, successful suppression was associated with anterior activations in executive areas such as the MidFS, the Opc, and ACC. These results indicate that control over suppressed thoughts obeys an engagement of executive control areas, whereas failure at suppressing thoughts is accompanied by a hyperactivity of visual and memory-related areas. All results are p < .001 (voxel level) and p < .05 cluster-level correction (Gaussian random field theory) for multiple comparisons. (B) Task decoding. Visual ROIs contained useful information to classifying failed from successful suppression trials. V1: 68.95%, one-tailed t test, t(7) = 6.34, p = 2.66 · 10 −4 , 95% CI [63.28, inf] V2: 71.38%, t(7) = 9.08, p = 4.13 · 10 −5 , 95% CI [66.91, inf] V3: 80.5%, t(7) = 11.3, p = 2.06 · 10 −5 , 95% CI [75.39, inf] V4: 65.87%, t(7) = 4.02, p = .003, 95% CI [58.3886, inf] all p values FDR corrected, q = 0.05. Error bars correspond to +1 SEM. Ins = insula MidOC = middle occipital cortex MedOTS = medial occipito-temporal sulcus Opc = operculum ParHC = parahippocampal gyrus.

The contents of subjective successful suppression are decodable using information from imagery. To test whether subjectively, successfully suppressed thoughts shared informational content with imagery representations, we performed a cross-decoding analysis. We thus attempted to decode the content of successfully suppressed thoughts (broccoli or apple) using classifiers trained on imagery trials, on two ROIs were the contents of suppression were most readily extractable (ROI threshold Z > 2, Gaussian random field theory–corrected at the cluster level p < .05) see Methods section for details). The contents of successfully suppressed thoughts were decoded above chance and using patterns from imagery trials in the LOC ROI—65.6% accuracy, one-tailed t test, t(7) = 2.53, p = .0443, 95% CI [50.2, inf], FDR-corrected q = 0.05—but not in the IFG ROI (49.48%). These results indicate that subjectively, successfully suppressed thoughts contain similar information to imagery representations (arguably visual in nature as contained in visual areas). Error bars correspond to ±1 SEM.

The contents of subjective successful suppression are decodable using information from imagery. To test whether subjectively, successfully suppressed thoughts shared informational content with imagery representations, we performed a cross-decoding analysis. We thus attempted to decode the content of successfully suppressed thoughts (broccoli or apple) using classifiers trained on imagery trials, on two ROIs were the contents of suppression were most readily extractable (ROI threshold Z > 2, Gaussian random field theory–corrected at the cluster level p < .05) see Methods section for details). The contents of successfully suppressed thoughts were decoded above chance and using patterns from imagery trials in the LOC ROI—65.6% accuracy, one-tailed t test, t(7) = 2.53, p = .0443, 95% CI [50.2, inf], FDR-corrected q = 0.05—but not in the IFG ROI (49.48%). These results indicate that subjectively, successfully suppressed thoughts contain similar information to imagery representations (arguably visual in nature as contained in visual areas). Error bars correspond to ±1 SEM.

We used a well-established decoding approach to extract information related to each grating contained in the pattern of activation across voxels of a given participant using the The Decoding Toolbox (Hebart et al., 2015). For the task decoding (Figures 2C and 3B), we used a leave-one-run-out cross-validation scheme. We trained a linear supporting vector machine on all runs except one and then tested on the remaining one. We repeated this procedure until all runs were used as test and then averaged the results across validations (eightfold). Using this approach, we tested whether information about the task nature could be decoded from functionally defined visual areas (from V1 to V4 see Visual ROI Functional Definition section for details). For the content decoding analysis (Figure 4), we employed cross-classification to generalize information between imagery and the successful suppression trials. We thus trained on the ensemble of the imagery runs and tested on the ensemble of the successful suppression trials. The same was done for the perception-successful suppression analysis (Supplementary Figure S5) where the ensemble of perception runs was used for training. No cross-validation was used here as the data sets were independent thus, there was no risk of overfitting. We employed an ROI to test common representational content in functionally defined areas as containing suppressed object information (see Suppressed Object Information Containing ROI Definition section for details). Decoding accuracies were averaged across runs and tested against chance level (50%) using a one-sample t test across participants.

Statistical Analysis on Brain Images

All second-level (across participants) brain statistical images (derived from SPM or decoding) were subjected to a threshold at the voxel level p < .001, as recommended in previous studies (Woo, Krishnan, & Wager, 2014). We then corrected for multiple comparisons using cluster-extent based thresholding employing Gaussian random field theory (Worsley et al., 1996 Friston, Worsley, Frackowiak, Mazziotta, & Evans, 1994) at p < .05, as implemented in FMRIB Software Library (Smith et al., 2004). Importantly, these thresholds have been shown to be valid within the nominal false-positive ratios (Eklund, Nichols, & Knutsson, 2016).


METHODS

Functional Experiment

Participants

Twenty participants were recruited from the University of York. One participant's data were excluded because of excessive motion artifacts, leaving 19 participants in the final analysis (11 women mean age = 23.67 years, range = 18–37 years). Participants were native British speakers, right-handed, and had normal or corrected-to-normal vision. Participants gave written informed consent and were reimbursed for their time. The study was approved by the York Neuroimaging Centre ethics committee at the University of York.

Design

The functional experiment contained six experimental conditions, in a 2 (concepts dog, car) × 3 (type of imagery auditory, visual and conceptually complex context) design (see Supplementary Material A2 for full list of experimental conditions).

Stimuli

Experimental stimuli consisted of (i) six verbal conceptual prompts that referred to each of our six experimental conditions (e.g., Dog Sound, which cued participants to imagine what a dog sounded like) and (ii) visual and auditory noise that was presented throughout experimental conditions and rest periods. For this, Gaussian visual noise was generated through Psychopy (Pyschopy, 2.7), and auditory white noise was generated through Audacity software (Audacity Version 2.0.0), followed by (iii) target images/sounds. The targets used in this paradigm were piloted before fMRI scanning on a separate group of participants (n = 24) to determine the average length of time taken to detect a target (image or sound) emerging through noise (see Supplementary Material A1 and Table A2 for a full description of the pilot experiment). From this pilot, 10 images were selected for each of our six experimental conditions (Dog Visual-Features, Car Visual-Features, Dog Sound, Car Sound, Dog Context, and Car Context) based on statistically similar RTs for detecting the item emerging through noise (see Supplementary Material A3 for a full list of stimuli). Images were detected, on average, at 2861 msec and sounds at 2912 msec (see Table 1). The fMRI scan, therefore, allowed 3000 msec for participants to detect whether an item emerging through noise matched the content of their imagery.

Condition . fMRI Experiment .
RT (msec) . Acc (%) .
Car sound 2748 (713) 82.11 (16.53)
Dog sound 2753 (552) 76.84 (12.04)
Car visual 2704 (204) 83.68 (11.64)
Dog visual 2620 (241) 82.63 (9.91)
Car context 2754 (211) 76.76 (12.62)
Dog context 2569 (250) 79.61 (14.71)
Condition . fMRI Experiment .
RT (msec) . Acc (%) .
Car sound 2748 (713) 82.11 (16.53)
Dog sound 2753 (552) 76.84 (12.04)
Car visual 2704 (204) 83.68 (11.64)
Dog visual 2620 (241) 82.63 (9.91)
Car context 2754 (211) 76.76 (12.62)
Dog context 2569 (250) 79.61 (14.71)

Standard deviation in parentheses. Acc = percentage accuracy.

Task Procedure

Before being scanned, participants completed a practice session, identical to one scanning run. After this practice run, participants were probed to describe what they had been focused on during the pure-noise trials to ensure they were imagining the relevant concepts. For the in-scanner task, stimuli were presented in four independent runs. Within each scanning run, participants were presented a cue word (e.g., Sound DOG) and instructed to imagine this concept in the presence of visual and auditory noise for instance, they were told to imagine the sound of a dog barking, growling, yelping, and so forth. They were asked to continue imagining the relevant visual or auditory properties until the stimulus appeared. Task instructions were presented for 3 sec. A variable number of images then followed, each displaying visual and auditory noise (see Figure 1). Within the blocks, the pure-noise images were each shown for 3 sec. Following a variable length of time (between 6 and 12 sec after the initial cue), a target image or sound began to emerge through the noise (at the rate outlined in the pilot experiment described above). Participants were instructed to respond with a button press (yes/no) whether a target item emerging through visual and auditory noise was related to what they had being imagining based on the cue word. Participants were given 3000 msec to respond to this item. The block automatically ended after this image. This design afforded us the high signal sensitivity found with block designs, combined with unpredictability to keep participants cognitively engaged.

The basic design of the task, in which stimuli appeared through noise, was adapted from a previous fMRI study examining internally generated conceptual retrieval (see Coutanche & Thompson-Schill, 2015). One advantage of this method is that it required participants to sustain and search for a particular concept over time, ensuring that there were seconds of data available per trial for the decoding analyses. In addition, because the onset of the stimulus was subtle within the visual and auditory noise, participants were encouraged to engage in semantically driven retrieval before the presentation of the target.

Each experimental condition (e.g., “Dog Sound”) occurred twice in a run (giving eight blocks for each condition across the experiment). Blocks were presented in a pseudorandomized order so the same cue did not immediately repeat, and blocks were separated by 12-sec fixation. During the fixation period, the visual noise and auditory noise were also presented to create an active baseline. Fifty percent of the items emerging through noise contained an item that did not match the preceding cue (i.e., four of eight were foils) to ensure that participants focused on the specific target. To encourage participants to pay attention from the very start of every block, an additional short block was included in each run, in which an item emerged through noise after only 3 sec, followed by 12 sec of fixation. These blocks were disregarded in the analysis.

Acquisition

Data were acquired using a GE 3T HD Excite MRI scanner at the York Neuroimaging Centre, University of York. A Magnex head-dedicated gradient insert coil was used in conjunction with a birdcage, radio frequency coil tuned to 127.4 MHz. A gradient-echo EPI sequence was used to collect data from 38 bottom–up axial slices aligned with the temporal lobe (repetition time [TR] = 2 sec, echo time [TE] = 18 msec, field of view [FOV] = 192 × 192 mm, matrix size = 64 × 64, slice thickness = 3 mm, slice gap = 1 mm, flip angle = 90°). Voxel size was 3 × 3 × 3 mm. Functional images were coregistered onto a T1-weighted anatomical image from each participant (TR = 7.8 sec, TE = 3 msec, FOV = 290 mm × 290 mm, matrix size = 256 mm × 256 mm, voxel size = 1.13 mm × 1.13 mm × 1 mm) using linear registration (FLIRT, FMRIB Software Library [FSL]). This sequence was chosen as previous studies employing this sequence have produced an adequate signal-to-noise ratio in regions prone to signal dropout, such as ATL (e.g., Murphy et al., 2017 Coutanche & Thompson-Schill, 2015).

To ensure that our ROIs had sufficient signal to detect reliable fMRI activation, the temporal signal-to-noise ratio (tSNR) for each participant was calculated by dividing the mean signal in each voxel by the standard deviation of the residual error time series in that voxel (Friedman, Glover, & The FBIRN Consortium, 2006). tSNR values were averaged across the voxels in both ATL and mPFC, regions that suffer from signal loss and distortion due to their proximity to air-filled sinuses (Jezzard & Clare, 1999). Mean tSNR values, averaged across participants, were as follows: ATL, 82.85 mPFC, 97.14. The percentage of voxels in each ROI that had “good” tSNR values (>20 Binder et al., 2011) was above 97% for all ROIs: ATL, 97.19% mPFC, 99.24%. These values indicate that the tSNR was sufficient to detect reliable fMRI activation in all ROIs (Binder et al., 2011).

Preprocessing

Imaging data were preprocessed using the FSL toolbox (www.fmrib.ox.ac.uk/fsl). Images were skull-stripped using a brain extraction tool (Smith, 2002) to remove nonbrain tissue from the image. The first five volumes (10 sec) of each scan were removed to minimize the effects of magnetic saturation, and slice-timing correction was applied. Motion correction (MCFLIRT Jenkinson, Bannister, Brady, & Smith, 2002) was followed by temporal high-pass filtering (cutoff = 0.01 Hz). Individual participant data were first registered to their high-resolution T1 anatomical image and then into a standard space (Montreal Neurological Institute MNI152) this process included trilinear interpolation of voxel sizes to 2 × 2 × 2 mm. For univariate analyses, data were additionally smoothed (Gaussian FWHM 6 mm).

Multivariate Pattern Analysis

Analysis was focused on the moments when participants were imagining the target cues (e.g., thinking about what a dog looked like or what a car sounded like). The condition onset and duration were taken from the first pure-noise trial in each block (after the initial cue) to the end of the last pure-noise trial (before the item began to emerge through the noise). The response to each of the six conditions was contrasted against the active rest baseline (periods of auditory and visual noise where participants were not cued to imagine concepts). Box-car regressors for each condition, for each run, in the general linear model were convolved with a double gamma hemodynamic response function (FEAT, FSL). Regressors of no interest were also included to account for head motion within scans. MVPA was conducted on spatially unsmoothed data to preserve local voxel information. For each voxel in the brain, we computed a linear support vector machine (LIBSVM with fixed regularization hyperparameter C = 1) and a fourfold cross-validation (leave-one-run-out) classification, implemented in custom python scripts using the pyMVPA software package (Hanke et al., 2009). A support vector machine was chosen to combat overfitting by limiting the complexity of the classifier (Lewis-Peacock & Norman, 2014). The classifier was trained on three runs and tested on the independent fourth run the testing set was then alternated for each of four iterations. Classifiers were trained and tested on individual participant data transformed into MNI standard space. The functional data were first z-scored per voxel within each run. The searchlight analysis was implemented by extracting the z-scored β values from spheres (6-mm radius) centered on each voxel in the masks. This sized sphere included ∼123 3-mm voxels (when not restricted by the brain's boundary Kriegeskorte, Goebel, & Bandettini, 2006). Classification accuracy (proportion of correctly classified trials) for each sphere was assigned to the sphere's central voxel to produce accuracy maps. The resulting accuracy maps were then smoothed with a Gaussian kernel (6 mm FWHM). To determine whether accuracy maps were above chance levels (50%), individual accuracy maps were entered into a higher-level group analysis (mixed effects, FLAME www.fmrib.ox.ac.uk/fsl), testing the accuracy values across participants against chance for each voxel. Voxel inclusion was set at z = 2.3, with a cluster significance threshold at FWE p < .01.

The following classification tests were performed: (1) Car versus Dog classifier: This examined whether patterns of activity conveyed information about conceptual identity by training a classifier to discriminate between periods of noise where participants were thinking about a dog and periods of noise where participants were thinking about a car. We were not able to successfully classify the semantic class (dog vs. car) in our data set at the whole-brain level. As this analysis revealed no regions across the cortex could successfully decode this information, the remaining classification tests combined car and dog trials. (2) Auditory versus visual classifier: This examined whether patterns of activity conveyed information regarding the modality of imagery by training a classifier to discriminate between periods of noise where participants were thinking about the visual properties of objects and periods of noise where participants were thinking about the auditory properties of objects. (3) Visual versus context classifier: Here, a classifier was trained to discriminate between periods of noise where participants were thinking about the visual properties of objects and periods of time when participants were thinking about objects in more complex conceptual contexts. (4) Auditory versus context classifier: Here, a classifier was trained to discriminate between periods of noise where participants were thinking about the auditory properties of objects and periods of time when participants were thinking about objects in complex contexts. Unthresholded maps from all analyses are uploaded on Neurovault: neurovault.org/collections/2671/.

Next, we identified regions where patterns of activity consistently informed the classifier for each of our three conditions (visual, auditory, and context) by running a formal conjunction on the uncorrected searchlight maps (using the FSL easythresh command). For visual patterns, we looked at the conjunction of the two searchlight maps that decoded visual properties (visual vs. auditory and visual vs. context). Because regions that contributed to both of these searchlight maps were able to decode simple visual features in imagination, relative to both auditory features and more complex contexts, we reasoned that their pattern of activation related to simple visual features. Next, we looked at the conjunction of the two searchlight maps that decoded the auditory condition (auditory vs. visual and auditory vs. context) to identify brain regions containing patterns of activation relating to simple auditory properties in imagination. Finally, we looked at the conjunction of the two searchlight maps that decoded context properties (context vs. visual and context vs. auditory). This identified brain regions containing activation patterns relating to complex conceptual contexts, as distinct from both simple visual and auditory features. All analyses were cluster-corrected using a z-statistic threshold of 2.3 to define contiguous clusters. Multiple comparisons were controlled using a Gaussian Random Field Theory at a threshold of p < .01.

Univariate Analysis

We examined univariate activation to further characterize the response within our unimodal and transmodal regions defined by MVPA. The percent signal change was extracted for each condition from ROIs defined by the MVPA conjunctions (see above).

Resting-state fMRI

Participants

This analysis was performed on a separate cohort of 157 healthy participants at York Neuroimaging Centre (89 women mean age = 20.31 years, range = 18–31 years). Participants completed a 9-min functional connectivity MRI scan during which they were asked to rest in the scanner with their eyes open. Using these data, we examined the resting-state fMRI connectivity of our conjunction regions that were informative to decoding visual imagery, auditory imagery, and contextual imagery to investigate whether these regions fell within similar or distinct networks. The data from our resting-state scans have been used in prior published works from the same lab (e.g., Murphy et al., 2017, 2018 Villena-Gonzalez et al., 2018 Wang et al., 2018 Poerio et al., 2017 Sormaz et al., 2017 Vatansever et al., 2017).

Acquisition

As with the functional experiment, a Magnex head-dedicated gradient insert coil was used in conjunction with a birdcage, radio frequency coil tuned to 127.4 MHz. For the resting-state data, a gradient-echo EPI sequence was used to collect data from 60 axial slices with an interleaved (bottom–up) acquisition order with the following parameters: TR = 3 sec, TE = minimum full, volumes = 180, flip angle = 90°, matrix size = 64 × 64, FOV = 192 × 192 mm, voxel size = 3 × 3 × 3 mm. A minimum full TE was selected to optimize image quality (as opposed to selecting a value less than minimum full, which, for instance, would be beneficial for obtaining more slices per TR). Functional images were coregistered onto a T1-weighted anatomical image from each participant (TR = 7.8 sec, TE = 3 msec, FOV = 290 mm × 290 mm, matrix size = 256 mm × 256 mm, voxel size = 1 mm × 1 mm × 1 mm).

Preprocessing

Data were preprocessed using the FSL toolbox (www.fmrib.ox.ac.uk/fsl). Before conducting the functional connectivity analysis, the following prestatistics processing was applied to the resting-state data: motion correction using MCFLIRT to safeguard against motion-related spurious correlations slice-timing correction using Fourier space time-series phase shifting nonbrain removal using a brain extraction tool spatial smoothing using a Gaussian kernel of FWHM 6 mm grand mean intensity normalization of the entire 4-D data set by a single multiplicative factor high-pass temporal filtering (Gaussian-weighted least squares straight line fitting, with sigma = 100 sec) Gaussian low-pass temporal filtering, with sigma = 2.8 sec.

Low-level Analysis

For each conjunction site, we created spherical seed ROIs, 6 mm in diameter, centered on the peak conjunction voxel: visual conjunction site in the left inferior lateral occipital cortex [−48 −70 −2], auditory conjunction site in the left superior temporal gyrus [−48 −12 −10], and context conjunction site in the left lateral occipital cortex [−48 −60 0], respectively (see Supplementary Table A2). This ensured that we assessed the functional connectivity of a key site when the searchlight conjunction revealed a large cluster or multiple clusters. The time series of these regions were extracted and used as explanatory variables in a separate participant-level functional connectivity analysis for each seed. Participant-specific nuisance regressors were determined using a component-based noise correction (CompCor) approach (Behzadi, Restom, Liau, & Liu, 2007). This method applies PCA to the fMRI signal from participant-specific white matter and CSF ROIs. In total, there were 11 nuisance regressors, five regressors from the CompCor and a further six nuisance regressors were identified using the motion correction MCFLIRT. These principle components were then removed from the fMRI data through linear regression. The working memory (WM) and CSF covariates were generated by segmenting each individual's high-resolution structural image (using FAST in FSL Zhang, Brady, & Smith, 2001). The default tissue probability maps, referred to as prior probability maps, were registered to each individual's high-resolution structural image (T1 space), and the overlap between these prior probability maps and the corresponding CSF and WM maps was identified. These maps were then thresholded (40% for the SCF and 66% for the WM), binarized, and combined. The six motion parameters were calculated in the motion correction step during preprocessing. Movement in each of the three Cartesian directions (x, y, z) and rotational movement around three axes (pitch, yaw, roll) were included for each individual.

High-level Analysis

At the group level, the data were processed using FEAT Version 5.98 within FSL (www.fmrib.ox.ac.uk/fsl), and the analyses were carried out using FMRIB's Local Analysis of Mixed Effects (FLAME) Stage 1 with automatic outlier detection. No global signal regression was performed. The z-statistic images were then thresholded using clusters determined by z > 2.3 and a cluster-corrected significance threshold of p = .01. Finally, to determine whether our connectivity maps overlapped with one another, we calculated the number of overlapping voxels for our three conjunction site connectivity maps.


Materials and Methods

Participants.

Twenty-five healthy volunteers took part in the initial experiment conducted in Trento and 21 healthy volunteers took part in the control experiment performed in Salzburg. All participants presented normal or corrected-to-normal vision and no neurological or psychiatric disorders. Three participants for the initial experiment and one participant for the control experiment were excluded from the analysis due to excessive artifacts in the MEG data leading to an insufficient number of trials per condition after artifact rejection (fewer than 30 trials for at least one condition). Additionally, within each experiment six participants were discarded from the analysis because the false alarms rate exceeded 30% and/or the near-threshold detection rate was over 85% or below 15% for at least one sensory modality (due to threshold identification failure and difficulty in using response button mapping during the control experiment, also leaving fewer than 30 trials for at least one relevant condition in one sensory modality: detected or undetected). The remaining 16 participants (11 females, mean age 28.8 y SD, 3.4 y) for the initial experiment and 14 participants (9 females, mean age 26.4 y SD, 6.4 y) for the control experiment reported normal tactile and auditory perception. The ethics committee of the University of Trento and University of Salzburg, respectively, approved the experimental protocols that were used with the written informed consent of each participant.

Stimuli.

To ensure that the participant did not hear any auditory cues caused by the piezo-electric stimulator during tactile stimulation, binaural white noise was presented during the entire experiment (training blocks included). Auditory stimuli were presented binaurally using MEG-compatible tubal in-ear headphones (SOUNDPixx VPixx Technologies). Short bursts of white noise with a length of 50 ms were generated with Matlab and multiplied with a Hanning window to obtain a soft on- and offset. Participants had to detect short white noise bursts presented near their hearing threshold (27). The intensity of such transient target auditory stimuli was determined prior to the experiment to emerge from the background constant white noise stimulation. Visual stimuli were Gabor ellipsoid (tilted 45° 1.4° radius frequency, 0.1 Hz phase, 90 sigma of Gaussian, 10) back projected on a translucent screen by a Propixx DLP projector (VPixx Technologies) at a refresh rate of 180 frames per second. On the black screen background, a centered gray fixation circle (2.5° radius) with a central white dot was used as a fixation point. The stimuli were presented for 50 ms in the center of the screen at a viewing distance of 110 cm. Tactile stimuli were delivered with a 50-ms stimulation to the tip of the left index finger, using one finger module of a piezo-electric stimulator (Quaerosys) with 2 × 4 rods, which can be raised to a maximum of 1 mm. The module was attached to the finger with tape and the participant’s left hand was cushioned to prevent any unintended pressure on the module (28). For the control experiment (conducted in another laboratory i.e., Salzburg), visual, auditory, and tactile stimulation setups were identical but we used a different MEG/MRI vibrotactile stimulator system (CM3 Cortical Metrics).

Task and Design.

The participants performed three blocks of a NT perception task. Each block included three separate runs (100 trials each) for each sensory modality: tactile (T), auditory (A), and visual (V). A short break (∼1 min) separated each run and longer breaks (∼4 min) were provided to the participants after each block. Inside a block, runs alternated in the same order within subject and were pseudorandomized across subjects (i.e., subject 1 = TVA-TVA-TVA subject 2 = VAT-VAT-VAT …). Participants were asked to fixate on a central white dot in a gray central circle at the center of the screen throughout the whole experiment to minimize eye movements.

A short training run with 20 trials was conducted to ensure that participants had understood the task. Then, in three different training sessions prior to the main experiment, participants’ individual perceptual thresholds (tactile, auditory, and visual) were determined in the shielded room. For the initial experiment, a one-up/one-down staircase procedure with two randomly interleaved staircases (one upward and one downward) was used with fixed step sizes. For the control experiment we used a Bayesian active sampling protocol to estimate psychometric slope and threshold for each participant (62). Once determined by these staircase procedures, all near-threshold stimulation intensities remained stable during each block of the whole experiment for a given participant. All stimulation intensities can be found in SI Appendix, Table S1.

The main experiment consisted of a detection task (Fig. 1A). At the beginning of each run, participants were told that on each trial a weak stimulus (tactile, auditory, or visual depending on the run) could be presented at random time intervals. Five hundred milliseconds after the target stimulus onset, participants were prompted to indicate whether they had felt the stimulus with an on-screen question mark (maximal response time: 2 s). Responses were given using MEG-compatible response boxes with the right index finger and the middle finger (response-button mapping was counterbalanced among participants). Trials were then classified into hits (detected stimulus) and misses (undetected stimulus) according to the participants’ answers. Trials with no response were rejected. Catch (above perceptual threshold stimulation intensity) and sham (absent stimulation) trials were used to control false alarms and correct rejection rates across the experiment. Overall, there were nine runs with 100 trials each (in total 300 trials for each sensory modality). Each trial started with a variable interval (1.3 to 1.8 s, randomly distributed) followed by an experimental near-threshold stimulus (80 per run), a sham stimulus (10 per run), or a catch stimulus (10 per run) of 50 ms each. Each run lasted for ∼5 min. The whole experiment lasted for ∼1 h.

Identical timing parameters were used in the control experiment. However, a specific response screen design was used to control for motor response mapping. For each trial the participants must use a different response mapping related to the circle’s color surrounding the question mark during the response screen. Two colors (blue or yellow) were used and presented randomly after each trial during the control experiment. One color was associated to the response mapping rule “press the button only if there is a stimulation” (for the near-threshold condition, detected) and the other color was associated to the opposite response mapping rule “press a button only if there is no stimulation” (for the near-threshold condition, undetected). The association between one response mapping and a specific color (blue or yellow) was fixed for a single participant but was predefined randomly across different participants. Importantly, by delaying the response mapping to after the stimulus presentation in an (for the individual) unpredictable manner, neural patterns during relevant periods putatively cannot be confounded by response selection/preparation. Both experiments were programmed in Matlab using the open source Psychophysics Toolbox (63).

MEG Data Acquisition and Preprocessing.

MEG was recorded at a sampling rate of 1 kHz using a 306-channel (204 first-order planar gradiometers, 102 magnetometers) VectorView MEG system for the first experiment in Trento and a Triux MEG system for the control experiment in Salzburg (Elekta-Neuromag Ltd.) in a magnetically shielded room (AK3B Vakuumschmelze). Before the experiments, individual head shapes were acquired for each participant including fiducials (nasion and preauricular points) and around 300 digitized points on the scalp with a Polhemus Fastrak digitizer. Head positions of the individuals relative to the MEG sensors were continuously controlled within a run using five coils. Head movements did not exceed 1 cm within and between blocks.

Data were analyzed using the Fieldtrip toolbox (64) and the CoSMoMVPA toolbox (65) in combination with MATLAB 8.5 (MathWorks). First, a high-pass filter at 0.1 Hz (FIR filter with transition bandwidth 0.1 Hz) was applied to the continuous data. Then the data were segmented from 1,000 ms before to 1,000 ms after target stimulation onset and down-sampled to 512 Hz. Trials containing physiological or acquisition artifacts were rejected. A semiautomatic artifact detection routine identified statistical outliers of trials and channels in the datasets using a set of different summary statistics (variance, maximum absolute amplitude, maximum z value). These trials and channels were removed from each dataset. Finally, the data were visually inspected and any remaining trials and channels with artifacts were removed manually. Across subjects, an average of five channels (±2 SD) were rejected. Bad channels were excluded from the whole dataset. A detailed report of the remaining number of trials per condition for each participant can be found SI Appendix, Table S1. Finally, in all further analyses and within each sensory modality for each subject, an equal number of detected and undetected trials was randomly selected to prevent any bias across conditions (66).

Source Analyses.

Neural activity evoked by stimulus onset was investigated by computing ERFs. For all source-level analyses, the preprocessed data were 30-Hz lowpass filtered and projected to source level using a linearly constrained minimum variance (LCMV) beamformer analysis (67). For each participant, realistically shaped, single-shell head models (68) were computed by coregistering the participants’ head shapes either with their structural MRI or—when no individual MRI was available (three participants and two participants, for the initial experiment and the control experiment, respectively)—with a standard brain from the Montreal Neurological Institute (MNI), warped to the individual head shape. A grid with 1.5-cm resolution based on an MNI template brain was morphed into the brain volume of each participant. A common spatial filter (for each grid point and each participant) was computed using the lead fields and the common covariance matrix, taking into account the data from both conditions (detected and undetected or catch and sham) for each sensory modality separately. The covariance window for the beamformer filter calculation was based on 200 ms pre- to 500 ms poststimulus. Using this common filter, the spatial power distribution was then estimated for each trial separately. The resulting data were averaged relative to the stimulus onset in all conditions (detected, undetected, catch, and sham) for each sensory modality. Only for visualization purposes a baseline correction was applied to the averaged source-level data by subtracting a time window from 200 ms prestimulus to stimulus onset. Based on a significant difference between event-related fields of the two conditions over time for each sensory modality, the source localization was performed restricted to specific time windows of interest. All source images were interpolated from the original resolution onto an inflated surface of an MNI template brain available within the Caret software package (69). The respective MNI coordinates and labels of localized brain regions were identified with an anatomical brain atlas (AAL atlas ref. 70) and a network parcellation atlas (29). Source analysis of MEG data is an inherently underspecified problem and no unique solution exists. Furthermore, source leakage cannot be avoided, further reducing the accuracy of any analysis. Finally, we remind the reader that we do not expect more than 3 cm precision on our results because we used standard LCMV source localization with a 1.5-cm grid. In other words, source plots should be seen as suggestive rather conclusive evidence for underlying brain regions only.

MVPA Decoding.

MVPA decoding was performed for the period 0 to 500 ms after stimulus onset based on normalized (z-scored) single-trial source data down-sampled to 100 Hz (i.e., time steps of 10 ms). We used multivariate pattern analysis as implemented in CoSMoMVPA (65) to identify when and what kind of a common network between sensory modalities is activated during the near-threshold detection task. We defined two classes for the decoding related to the task behavioral outcome (detected and undetected). For decoding within the same sensory modality, single-trial source data were randomly assigned to one of two chunks (half of the original data).

For decoding of all sensory modalities together, single-trial source data were pseudorandomly assigned to one of the two chunks with half of the original data for each sensory modality in each chunk. Data were classified using a twofold cross-validation procedure, where a Bayes-naive classifier predicted trial conditions in one chunk after training on data from the other chunk. For decoding between different sensory modalities, single-trial source data of one modality were assigned to one testing chunk and the trials from other modalities were assigned to the training chunk. The number of target categories (e.g., detected/undetected) was balanced in each training partition and for each sensory modality. The trials data are equally partitioned into the chunks (i.e., we have the same amount of the three runs and block trials for each modality in each separate chunk used for classification). Training and testing partitions always contained different sets of data.

First, the temporal generalization method was used to explore the ability of each classifier across different time points in the training set to generalize to every time point in the testing set (21). In this analysis we used local neighborhoods features in time space (time radius of 10 ms: for each time step we included as additional features the previous and next time sample data point). We generated temporal generalization matrices of task decoding accuracy (detected/undetected), mapping the time at which the classifier was trained against the time it was tested. Generalization of decoding accuracy over time was calculated for all trials and systematically depended on a specific between- or within-sensory-modality decoding. The reported average accuracy of the classifier for each time point corresponds to the group average of individual effect size: the ability of classifiers to discriminate detected from undetected trials. We summarized time generalization by keeping only significant accuracy for each sensory modality decoding. Significant classifiers’ accuracies were normalized between 0 and 1, y t = x t − m i n ( x ) m a x ( x ) − m i n ( x ) , [1]

where x is a variable of all significant decoding accuracies and x t is a given significant accuracy at time t . Normalized accuracies ( y t ) were then averaged across significant testing time and decoding conditions. The number of significant classifier generalizations across testing time points and the relevant averaged normalized accuracies were reported along the training time dimension (Figs. 3B and 5B). For all significant time points previously identified we performed a “searchlight” analysis across brain sources and time neighborhood structure. In this analysis we used local neighborhoods features in source and time space. We used a time radius of 10 ms and a source radius of 3 cm. All significant searchlight accuracy results were averaged over time and only the maximum 10% significant accuracies were reported on brain maps for each sensory modality decoding condition (Fig. 4) or for all conditions together (Fig. 5C).

Finally, we applied the same type of analysis to all sensory modalities by taking all blocks together with detected and undetected NT trials (equalized within each sensory modality). For the control experiment, we equalized trials based on the 2 × 2 design with detection report (detected or undetected) and type of response (“button press = response” or “no response”), so that we get the same number of trials inside each category (i.e., class) for each sensory modality. We performed a similar decoding analysis by using a different class definition: either detected vs. undetected or response vs. no response (SI Appendix, Fig. S3 B and C).

Statistical Analysis.

Detection rates for the experimental trials were statistically compared to those from the catch and sham trials, using dependent samples t tests. Concerning the MEG data, the main statistical contrast was between trials in which participants reported a stimulus detection and trials in which they did not (detected vs. undetected).

The evoked response at the source level was tested at the group level for each of the sensory modalities. To eliminate polarity, statistics were computed on the absolute values of source-level event-related responses. Based on the global average of all grid points, we first identified relevant time periods with maximal difference between conditions (detected vs. undetected) by performing group analysis with sequential dependent t tests between 0 and 500 ms after stimulus onset using a sliding window of 30 ms with 10 ms overlap. P values were corrected for multiple comparisons using Bonferroni correction. Then, to derive the contributing spatial generators of this effect, the conditions detected and undetected were contrasted for the specific time periods with group statistical analysis using nonparametric cluster-based permutation tests with Monte Carlo randomization across grid points controlling for multiple comparisons (71).

The multivariate searchlight analysis results discriminating between conditions were tested at the group level by comparing the resulting individual accuracy maps against chance level (50%) using a nonparametric approach implemented in CoSMoMVPA (65), adopting 10,000 permutations to generate a null distribution. P values were set at P < 0.005 for cluster-level correction to control for multiple comparisons using a threshold-free method for clustering (72), which has been used and validated for MEG/EEG data (40, 73). The time generalization results at the group level were thresholded using a mask with corrected z score > 2.58 (or Pcorrected < 0.005) (Figs. 3A and 5A). Time points exceeding this threshold were identified and reported for each training data time course to visualize how long time generalization was significant over testing data (Figs. 3B and 5B). Significant accuracy brain maps resulting from the searchlight analysis on previously identified time points were reported for each decoding condition. The maximum 10% of averaged accuracies were depicted for each significant decoding cluster on brain maps (Figs. 4 and 5).

Data Availability.

A down-sampled (to 100 Hz) version of the data is available at the OSF public repository (https://osf.io/E5PMY/). The original nonresampled raw data are available, upon reasonable request, from the corresponding author. Data analysis code is available at the corresponding author’s GitLab repository (https://gitlab.com/gaetansanchez).


Decoding moral judgments from neural representations of intentions

Intentional harms are typically judged to be morally worse than accidental harms. Distinguishing between intentional harms and accidents depends on the capacity for mental state reasoning (i.e., reasoning about beliefs and intentions), which is supported by a group of brain regions including the right temporo-parietal junction (RTPJ). Prior research has found that interfering with activity in RTPJ can impair mental state reasoning for moral judgment and that high-functioning individuals with autism spectrum disorders make moral judgments based less on intent information than neurotypical participants. Three experiments, using multivoxel pattern analysis, find that (i) in neurotypical adults, the RTPJ shows reliable and distinct spatial patterns of responses across voxels for intentional vs. accidental harms, and (ii) individual differences in this neural pattern predict differences in participants’ moral judgments. These effects are specific to RTPJ. By contrast, (iii) this distinction was absent in adults with autism spectrum disorders. We conclude that multivoxel pattern analysis can detect features of mental state representations (e.g., intent), and that the corresponding neural patterns are behaviorally and clinically relevant.

Thinking about another’s thoughts increases metabolic activity in a specific group of brain regions. These regions, which comprise the “theory of mind network,” include the medial prefrontal cortex (MPFC), precuneus (PC), right superior temporal sulcus (RSTS), and bilateral temporal-parietal junction (TPJ). Although many studies have investigated the selectivity and domain specificity of these brain regions for theory of mind (1, 2), a distinct but fundamental question concerns the computational roles of these regions: which features of people’s beliefs and intentions are represented, or made explicit, in these brain regions? Prior work has focused on where in the brain mental state reasoning occurs, whereas the present research builds on this work to investigate how neural populations encode these concepts.

A powerful approach for understanding neural representation in other domains has been to ask which features of a stimulus can be linearly decoded from a population of neurons. For example, in the ventral visual stream (involved in object recognition), low-level stimulus properties like line orientation and shading are linearly decodable from small populations of neurons in early visual areas (e.g., V1), whereas in higher-level regions, the identity of an object becomes linearly decodable and invariant across viewing conditions (3, 4). These results suggest that as information propagates through the ventral pathway, the neural response is reformatted to make features that are relevant to object identity more explicit to the next layer of neurons (3).

A decoding approach can be similarly applied to functional MRI (fMRI) data, using multivoxel pattern analysis (MVPA) to examine the spatial pattern of neural response within a brain region. If a distinction between cognitive tasks, stimulus categories, or stimulus features is coded in the population of neurons within a brain region, and if the subpopulations within the region are (at least partially) organized into spatial clusters or maps over cortex (5, 6), then the target distinction may be detectable in reliable spatial patterns of activity measurable with fMRI (7 ⇓ –9). MVPA has therefore been used to identify categories and features that are represented within a single region (10 ⇓ –12) and to relate these representations to behavioral performance (7, 9, 13).

Compared with object recognition, much less is known about the cognitive and neural mechanisms that support theory of mind. However, linear separability of the neural response could serve as a diagnostic measure of the core features and local computations even in this abstract domain. We therefore asked whether the spatial pattern of response in theory of mind brain regions could be used to decode a feature that has previously been shown to be critical for theory of mind: whether an action was performed intentionally or accidentally.

The distinction between intentional and accidental acts is particularly salient in the case of moral cognition. Adults typically judge the same harmful act (e.g., putting poison in a drink, failing to help someone who is hurt, making an insensitive remark) to be more morally wrong and more deserving of punishment when committed intentionally vs. accidentally (14). These moral judgments depend on individuals’ ability to consider another person’s beliefs, intentions, and knowledge, and emerge relatively late in childhood, around age 6–7 y (15). Individuals with autism spectrum disorders (ASD), who are disproportionately impaired on tasks that require them to consider people’s beliefs and intentions (16, 17), are also impaired in using information about an innocent intention to forgive someone for accidentally causing harm (18 ⇓ –20, but see ref. 21).

The right TPJ (RTPJ) is particularly implicated in these moral judgments. In prior research, increased RTPJ activation is related to greater consideration of mitigating intentions and more lenient punishment (22, 23) individual differences in the forgiveness of accidental harms are correlated with the magnitude of activity in the RTPJ at the time of the judgment (24) and interfering with activity in the RTPJ shifts moral judgments away from reliance on mental states (25).

Given the importance of intent for moral judgments of harms, we predicted that one or more of the brain regions in the theory of mind network would explicitly encode this feature of others’ mental states in neurotypical (NT) adults. That is, we predicted that (i) while participants read about a range of harmful acts, we would be able to decode whether the harm was intentional or accidental based on the spatial pattern of activity within theory of mind brain regions. We tested this prediction in three experiments with NT adults. We also investigated (ii) whether the robustness of the spatial pattern within individuals would predict those individuals’ moral judgments and (iii) whether, in a fourth experiment, high-functioning adults with ASD, who make atypical moral judgments of accidental harms, would show atypical patterns of neural activity in pattern or magnitude.

In all four experiments, participants in the scanner read short narratives in which someone caused harm to another individual, intentionally or accidentally (Fig. 1), as well as narratives involving no harm. Participants in experiments 1 and 2 made a moral judgment about the action. Participants in experiment 3 made true/false judgments about facts from the narratives. In experiment 4, high-functioning adults with ASD read and made moral judgments about the same narratives as in experiment 1.


Results

Behavioral results

All participants identified catch trials with high accuracy. Mean error rates were 5.0 ± 0.8%, (SEM). Reaction times for correct responses (measured with respect to video onset) were 1953 ± 25 ms (SEM).

Univariate fMRI results

To determine ROIs for subsequent MVPA, we computed a group contrast of all eight conditions (Action × Object Category × Kinematics) versus baseline (see Materials and Methods, ROI definition). This revealed widespread activations within left and right ventral and dorsal premotor cortex, intraparietal sulcus (IPS), and occipitotemporal cortex extending dorsally into posterior IPS and ventrally into middle and inferior temporal gyrus. Peak Talairach coordinates identified in the group contrast for the ROI MVPA were as follows: −47/0/27 (left PMv), 53/0/36 (right PMv), −43/−36/39 (left IPL), 35/−35/46 (right IPL), −43/−69/−2 (left LOTC), and 43/−65/1 (right LOTC).

In addition, we computed a univariate contrast “open” versus “close” (collapsed across object category and kinematics) to test for putative univariate effects. This contrast revealed no significant effects (even after applying very liberal correction thresholds of p = 0.05 at the voxel level). The lack of significant differences in the univariate contrast suggests that the activation levels were comparable over the two actions.

ROI MVPA results

In a ROI-based MVPA, we investigated the degree of generality of action representations (see Materials and Methods for details of the procedure) in regions typically associated with action observation, i.e., PMv, IPL, and LOTC (Fig. 2).

ROI MVPA results. Mean classification accuracies for decoding at concrete (red), intermediate (green), and abstract (blue) levels. Error bars indicate SEM, asterisks indicate statistical significance (different from 50% = chance, red = FDR corrected for the number of tests). Dotted line represents decoding accuracy at chance = 50%.

In IPL and LOTC, we could decode actions at all levels of abstraction, while in PMv, we could only decode actions at the concrete level (FDR corrected α = 0.034). A three-way repeated-measures ANOVA with the factors Abstraction Level × ROI × Hemisphere revealed main effects of Abstraction Level (F(2,378) = 10.23, p < 0.001) and ROI (F(2,378) = 18.93, p < 0.001). No effects of Hemisphere and no interactions were observed (all p > 0.1). Post hoc paired samples t tests revealed that decoding accuracies for intermediate and abstract levels were significantly higher in IPL compared with PMv and in LOTC compared with PMv (Table 1). In addition, accuracies in PMv differed significantly between concrete and abstract levels (Table 2). These results demonstrate that IPL and LOTC, but not PMv, encode actions at abstract levels of representation.

Results of post hoc paired samples t test between ROIs

Results of post hoc paired samples t test between abstraction levels

A second observation is that LOTC showed significantly stronger decoding for the concrete compared with intermediate and abstract levels, whereas IPL showed relatively similar decoding accuracies across all levels (Table 2). This indicates that LOTC contains both concrete and more abstract representations whereas IPL contains abstract action representations only suggesting that generalization from perceptual to conceptual action representations takes place in LOTC (see Materials and Methods, ROI MVPA for a detailed description of expected patterns of results).

Finally, in all regions, decoding accuracies for intermediate and abstract levels were at similar levels and did not show significant differences (Table 2), suggesting that generalization from concrete (object-specific) to abstract (object category-independent) action representations does not require an additional, intermediate (object-independent but object category-specific) abstraction step (see Materials and Methods, ROI MVPA for a detailed description of expected patterns of results).

Searchlight MVPA results

A searchlight analysis corroborated the findings of the ROI MVPA (Figs. 3, 4, Table 3): at the concrete level, we decoded actions in both hemispheres throughout the occipitotemporal cortex, postcentral sulcus (PoCS), IPS, and ventral as well as dorsal premotor cortex. At intermediate and abstract levels, we decoded actions in bilateral posterior middle temporal gyrus (pMTG)/inferior temporal sulcus (pITS) and PoCS (at the junction to anterior IPS), but not in areas anterior to the postcentral sulcus.

Mean accuracy maps of the searchlight MVPA at each abstraction level (concrete, intermediate, and abstract). Individual accuracy maps were cortex-based aligned, averaged, and projected onto a common group surface (both flat maps and lateral views of inflated hemispheres). Decoding accuracy at chance is 50%. CS, central sulcus IFS, inferior frontal sulcus IPS, intraparietal sulcus ITS, inferior temporal sulcus PrCS, precentral sulcus PoCS, postcentral sulcus SFS, superior frontal sulcus STS, superior temporal sulcus.

Statistical maps of the searchlight MVPA. For intermediate and abstract levels, conjunctions (i.e., lowest common t value per vertex) of concrete/intermediate and concrete/intermediate/abstract levels, respectively, were used (see Materials and Methods for details). Alignment and projection procedures are the same as in Figure 3. Outlines around clusters indicate clusters surviving cluster size correction (dark red, concrete dark green, intermediate dark blue, abstract thresholded at p = 0.005, corrected cluster threshold p = 0.05). CS, central sulcus IFS, inferior frontal sulcus IPS, intraparietal sulcus ITS, inferior temporal sulcus PrCS, precentral sulcus PoCS, postcentral sulcus SFS, superior frontal sulcus STS, superior temporal sulcus.

Clusters identified in the searchlight MVP analysis for action decoding at concrete, intermediate, and abstract levels


Difference between reverse inference and decoding (e.g. MVPA) in fMRI - Psychology

Decoding as a special case of reverse inference Bayes formula: P(M|A) = P(A|M∩Task) x P(M|Task) + P(A|</p>

Affiliations Brain and Cognitive Sciences Department, University of Rochester, Rochester, NY, United States of America, Rochester Center for Brain Imaging, University of Rochester, Rochester, NY, United States of America

Affiliations Brain and Cognitive Sciences Department, University of Rochester, Rochester, NY, United States of America, Rochester Center for Brain Imaging, University of Rochester, Rochester, NY, United States of America


Decoding moral judgments from neural representations of intentions

Intentional harms are typically judged to be morally worse than accidental harms. Distinguishing between intentional harms and accidents depends on the capacity for mental state reasoning (i.e., reasoning about beliefs and intentions), which is supported by a group of brain regions including the right temporo-parietal junction (RTPJ). Prior research has found that interfering with activity in RTPJ can impair mental state reasoning for moral judgment and that high-functioning individuals with autism spectrum disorders make moral judgments based less on intent information than neurotypical participants. Three experiments, using multivoxel pattern analysis, find that (i) in neurotypical adults, the RTPJ shows reliable and distinct spatial patterns of responses across voxels for intentional vs. accidental harms, and (ii) individual differences in this neural pattern predict differences in participants’ moral judgments. These effects are specific to RTPJ. By contrast, (iii) this distinction was absent in adults with autism spectrum disorders. We conclude that multivoxel pattern analysis can detect features of mental state representations (e.g., intent), and that the corresponding neural patterns are behaviorally and clinically relevant.

Thinking about another’s thoughts increases metabolic activity in a specific group of brain regions. These regions, which comprise the “theory of mind network,” include the medial prefrontal cortex (MPFC), precuneus (PC), right superior temporal sulcus (RSTS), and bilateral temporal-parietal junction (TPJ). Although many studies have investigated the selectivity and domain specificity of these brain regions for theory of mind (1, 2), a distinct but fundamental question concerns the computational roles of these regions: which features of people’s beliefs and intentions are represented, or made explicit, in these brain regions? Prior work has focused on where in the brain mental state reasoning occurs, whereas the present research builds on this work to investigate how neural populations encode these concepts.

A powerful approach for understanding neural representation in other domains has been to ask which features of a stimulus can be linearly decoded from a population of neurons. For example, in the ventral visual stream (involved in object recognition), low-level stimulus properties like line orientation and shading are linearly decodable from small populations of neurons in early visual areas (e.g., V1), whereas in higher-level regions, the identity of an object becomes linearly decodable and invariant across viewing conditions (3, 4). These results suggest that as information propagates through the ventral pathway, the neural response is reformatted to make features that are relevant to object identity more explicit to the next layer of neurons (3).

A decoding approach can be similarly applied to functional MRI (fMRI) data, using multivoxel pattern analysis (MVPA) to examine the spatial pattern of neural response within a brain region. If a distinction between cognitive tasks, stimulus categories, or stimulus features is coded in the population of neurons within a brain region, and if the subpopulations within the region are (at least partially) organized into spatial clusters or maps over cortex (5, 6), then the target distinction may be detectable in reliable spatial patterns of activity measurable with fMRI (7 ⇓ –9). MVPA has therefore been used to identify categories and features that are represented within a single region (10 ⇓ –12) and to relate these representations to behavioral performance (7, 9, 13).

Compared with object recognition, much less is known about the cognitive and neural mechanisms that support theory of mind. However, linear separability of the neural response could serve as a diagnostic measure of the core features and local computations even in this abstract domain. We therefore asked whether the spatial pattern of response in theory of mind brain regions could be used to decode a feature that has previously been shown to be critical for theory of mind: whether an action was performed intentionally or accidentally.

The distinction between intentional and accidental acts is particularly salient in the case of moral cognition. Adults typically judge the same harmful act (e.g., putting poison in a drink, failing to help someone who is hurt, making an insensitive remark) to be more morally wrong and more deserving of punishment when committed intentionally vs. accidentally (14). These moral judgments depend on individuals’ ability to consider another person’s beliefs, intentions, and knowledge, and emerge relatively late in childhood, around age 6–7 y (15). Individuals with autism spectrum disorders (ASD), who are disproportionately impaired on tasks that require them to consider people’s beliefs and intentions (16, 17), are also impaired in using information about an innocent intention to forgive someone for accidentally causing harm (18 ⇓ –20, but see ref. 21).

The right TPJ (RTPJ) is particularly implicated in these moral judgments. In prior research, increased RTPJ activation is related to greater consideration of mitigating intentions and more lenient punishment (22, 23) individual differences in the forgiveness of accidental harms are correlated with the magnitude of activity in the RTPJ at the time of the judgment (24) and interfering with activity in the RTPJ shifts moral judgments away from reliance on mental states (25).

Given the importance of intent for moral judgments of harms, we predicted that one or more of the brain regions in the theory of mind network would explicitly encode this feature of others’ mental states in neurotypical (NT) adults. That is, we predicted that (i) while participants read about a range of harmful acts, we would be able to decode whether the harm was intentional or accidental based on the spatial pattern of activity within theory of mind brain regions. We tested this prediction in three experiments with NT adults. We also investigated (ii) whether the robustness of the spatial pattern within individuals would predict those individuals’ moral judgments and (iii) whether, in a fourth experiment, high-functioning adults with ASD, who make atypical moral judgments of accidental harms, would show atypical patterns of neural activity in pattern or magnitude.

In all four experiments, participants in the scanner read short narratives in which someone caused harm to another individual, intentionally or accidentally (Fig. 1), as well as narratives involving no harm. Participants in experiments 1 and 2 made a moral judgment about the action. Participants in experiment 3 made true/false judgments about facts from the narratives. In experiment 4, high-functioning adults with ASD read and made moral judgments about the same narratives as in experiment 1.


METHODS

Participants

Experimental procedures were approved by the University of New South Wales human research ethics committee (HREC No. HC12030). All methods in this study were performed in accordance with the guidelines and regulations from the Australian National Statement on Ethical Conduct in Human Research (www.nhmrc.gov.au/guidelines-publications/e72). All participants gave informed written consent to participate in the experiment. We tested 15 participants (four women) aged 29.6 ± 1.4 (mean ± SEM) years. For analyses discriminating successful from failed suppression trials, a subset of eight participants (two women, aged 32 ± 2.1 years) was considered using a selection criterion of having at least 25% of failed or successful suppression trials. In other words, participants with failed/successful trial ratios more skewed than 1:3 were discarded.

We selected the sample size based on previous studies (Koenig-Robert & Pearson, 2019 Aso et al., 2016 Soon, Brass, Heinze, & Haynes, 2008) to meet standard criteria of statistical power. We conducted a post hoc power analyses to ascertain the power achieved using G*Power (Faul, Erdfelder, Buchner, & Lang, 2009). For GLM analyses, the power achieved was at least 0.91 at the voxel level (n = 15). For the decoding analysis discriminating the content of successful suppression, we achieved a power of at least 0.79 to detect differences in the paired t test for relevant conditions (n = 8).

Functional and Structural MRI Parameters

Scans were performed at the Neuroscience Research Australia facility, Sydney, Australia, in a Philips 3-T Achieva TX MRI scanner using a 32-channel head coil. Structural images were acquired using turbo field echo sequence consisting of 256 T1-weighted sagittal slices covering the whole brain (flip angle = 8°, matrix size = 256 × 256, voxel size = 1 mm isotropic). Functional T2*-weighted images were acquired using EPI sequence with 31 slices (flip angle = 90°, matrix size = 240 × 240, voxel size = 3 mm isotropic, repetition time = 2000 msec, echo time = 40 msec).

Suppression/Imagery Task

We adapted the behavioral task from a previous study from our group (Kwok et al., 2019) to satisfy fMRI requirements. We instructed participants to either imagine or suppress (avoid imagining) the visual thought of either a red apple or a green broccoli (see Figure 1A). Each trial started with a written object cue reading “green broccoli” or “red apple” for 2 sec. After this, a task cue was shown, reading either “imagine” or “suppress” for 2 sec. Participants were instructed to either visualize (imagery period) as vividly as they could the cued object in the imagine condition or avoid thinking about the cued object (suppression period) in the suppress condition for 12 sec. Importantly, we extensively instructed participants not to use object substitution (imagining another item to avoid imagining the cued object) as we did in our previous behavioral study (Kwok et al., 2019), where we tested both suppression and substitution strategies. We previously observed that object substitution led to very different outcomes compared to object suppression (i.e., substitution led to better thought control). A fixation point was shown on the screen, and participants were required to fixate. In the suppression condition, participants were instructed to press a button as soon as they detected that the visual thought of the to-be-suppressed object appeared in their minds. We labeled such events as “suppression breaks,” and the trial was labeled as a failed suppression trial. The suppression break button could be pressed multiple times within the 12 sec (Supplementary Figure S4 1 ), thus representing multiple suppression breaks. After the imagery/suppression period, a prompt asking to rate vividness (from 1 to 4, with 4 = strongest vividness) was presented after imagery or failed suppressed trials. Participants responded by pressing one of the four buttons on two response boxes. No vividness question was shown after successful suppression trials, which were automatically labeled as having vividness = 0. In failed suppressed trials, whenever multiple suppression breaks were reported, we instructed participants to rate the highest vividness suppression break event. After reporting the vividness of the visual thought (if required), an intertrial interval of 10 sec was observed during which the word “rest” appeared on the screen. In each run of 5 min, three trials of each type (imagine/suppress apple/broccoli) were tested, yielding 12 trials. Trials were pseudorandomized within a run.

Imagery/suppression fMRI task. (A) Imagery/visual thought suppression. Every trial started by a written cue indicating the object to be imagined or suppressed (either a green broccoli or a red apple) for a duration of 2 sec. After this, the task instruction was presented—“Imagine” or “Suppress”—for 2 sec. The fixation point remained on the screen for 12 sec during which the participants tried to either visualize the cued object as vividly as possible or suppress the visual thought of it. In suppression trials, participants pressed a designated button (same button irrespective of the object to be suppressed) to report a suppression break event, that is, when the mental image of the object to be suppressed appeared in their minds. In imagery trials and suppression trials with suppression breaks, participants were asked to report the subjective intensity of the visual thought experienced in a vividness scale from 1 = low to 4 = high. In suppression trials with no suppression breaks, the vividness prompt was not shown and vividness for that trial was assigned to 0. After every trial, an intertrial interval of 10 sec was observed a fixation point and the word “rest” were displayed on the screen. (B) Vividness rating in suppression trials for each participant. Suppression vividness from 0 = suppression success to 4 = highly vivid suppression break as the percentage of trials for every participant. Participants had a wide range of suppression break ratios. For analyses comparing suppression success and failure, only participants having at least 25% of suppression breaks or successful suppression were considered, which corresponded to eight participants (marked with a ★). (C) Vividness rating in imagery trials for each participant. Unlike the vividness ratings for suppression trials, vividness ratings in the imagery trials were more homogeneous across participants. This suggests that the differences across participants in vividness ratings in the suppression conditions correspond to interindividual differences in thought control (Kwok et al., 2019) rather than inconsistencies in the vividness report.

Imagery/suppression fMRI task. (A) Imagery/visual thought suppression. Every trial started by a written cue indicating the object to be imagined or suppressed (either a green broccoli or a red apple) for a duration of 2 sec. After this, the task instruction was presented—“Imagine” or “Suppress”—for 2 sec. The fixation point remained on the screen for 12 sec during which the participants tried to either visualize the cued object as vividly as possible or suppress the visual thought of it. In suppression trials, participants pressed a designated button (same button irrespective of the object to be suppressed) to report a suppression break event, that is, when the mental image of the object to be suppressed appeared in their minds. In imagery trials and suppression trials with suppression breaks, participants were asked to report the subjective intensity of the visual thought experienced in a vividness scale from 1 = low to 4 = high. In suppression trials with no suppression breaks, the vividness prompt was not shown and vividness for that trial was assigned to 0. After every trial, an intertrial interval of 10 sec was observed a fixation point and the word “rest” were displayed on the screen. (B) Vividness rating in suppression trials for each participant. Suppression vividness from 0 = suppression success to 4 = highly vivid suppression break as the percentage of trials for every participant. Participants had a wide range of suppression break ratios. For analyses comparing suppression success and failure, only participants having at least 25% of suppression breaks or successful suppression were considered, which corresponded to eight participants (marked with a ★). (C) Vividness rating in imagery trials for each participant. Unlike the vividness ratings for suppression trials, vividness ratings in the imagery trials were more homogeneous across participants. This suggests that the differences across participants in vividness ratings in the suppression conditions correspond to interindividual differences in thought control (Kwok et al., 2019) rather than inconsistencies in the vividness report.

Perception Task

We presented flickering natural images of a broccoli and an apple against a black background at 4.167 Hz at three different perceptual intensities (40%, 60%, and 80% transparency) to maximize subsequent classifier generalization ability (Bannert & Bartels, 2013). Natural images of a broccoli and an apple against a black background were retrieved on Google image search for images labeled for reuse with modification and were presented inside a rectangle (the same that was used in the imagery/suppression task Figure 1) including a fixation point at the center. Within a run of 3 min, we presented the flickering images in a block manner, interleaved with fixation periods of 15 sec each (apple: 15 sec, rest: 15 sec, broccoli: 15 sec, rest: 15 sec, etc.). Importantly, an attention task was performed consisting of detecting a change in fixation point brightness (+70% for 200 msec). Fixation changes were allocated randomly during a run, from one to four instances. Participants were instructed to press any of the four buttons as soon as they detected the changes. Participants showed high performance in the detection task (d′ = 2.89 ± 0.15 SEM).

Functional Mapping of Retinotopic Visual Areas

To functionally determine the boundaries of visual areas from V1 to V4 independently for each participant, we used the phase-encoding method (Warnking et al., 2002 Sereno et al., 1995). Double wedges containing dynamic colored patterns cycled through 10 rotations in 10 min (retinotopic stimulation frequency = 0.033 Hz). To ensure deployment of attention to the stimulus during the mapping, participants performed a detection task: pressing a button upon seeing a gray dot anywhere on the wedges.

Experimental Procedures

We performed the three experiments in a single scanning session lasting about 1.5 hr. Stimuli were delivered using an 18-in. MRI-compatible LCD screen (Philips ERD-2, 60-Hz refresh rate) located at the end of the bore. Participants held one dual-button response box in each hand (Lumina, Cedrus) that was used to record all responses. All stimuli were delivered, and responses were gathered employing the Psychtoolbox 3 (Brainard, 1997 Pelli, 1997) for MATLAB (The MathWorks Inc.) using in-house scripts. Participants' heads were restrained using foam pads and adhesive tape. Each session followed the same structure: first, the structural scanning followed by the retinotopic mapping (10 min). Then, the perception task was alternated with the imagery/suppression task until completing three runs of the perception task (3 min per run). Then, the imagery/suppression task was repeated until completing eight runs in total (5 min per run). Pauses were assigned in between the runs. The first four volumes of each functional runs were discarded to account for the equilibrium magnetization time, and each functional run started with 10 sec of fixation.

We interleaved perception and imagery/suppression blocks to remove order effects and to increase participants' engagement. For the same reason, we opted against organizing suppression and imagery trials in different blocks. Pseudorandomizing imagery and imagery trials within a block ensured that participants would not ignore the cue in suppression trials, because the task is only revealed after the cue. In addition, by randomizing the order of suppression and imagery trials, we avoided putting suppression trials together, which revealed to be exhausting based on postexperiment interviews from our previous study.

Visual ROI Functional Definition

fMRI retinotopic mapping data were analyzed using the fast Fourier transform (FFT) in MATLAB. The FFT was applied voxel-wise across time points. The complex output of the FFT contained both the amplitude and phase information of sinusoidal components of the BOLD signal. Phase information at the frequency of stimulation (0.033 Hz) was then extracted, using its amplitude as threshold (≥ 2 signal-to-noise ratio), and overlaid them on each participant's cortical surface reconstruction obtained using FreeSurfer (Fischl et al., 2004 Fischl, Sereno, & Dale, 1999). We manually delineated boundaries between retinotopic areas on the flattened surface around the occipital pole by identifying voxels showing phase reversals in the polar angle map, representing the horizontal and vertical visual meridians. In all participants, we clearly defined four distinct visual areas labeled V1, V2, V3 (specifically its ventral part known as VP), and V4. All four retinotopic labels were then defined as the intersection with the perceptual blocks (broccoli/apple > fixation, p < .001, false discovery rate [FDR] corrected), thus restricting the ROI to the foveal and parafoveal (∼5.5° of visual angle) representations of each visual area.

Suppressed Object Information Containing ROI Definition

We used a decoding approach to define ROIs bearing information about the content (apple vs. broccoli) of suppressed trials. We used these ROIs to test whether similar representational content was shared between imagery and successful suppression (Figure 4) and between perception and successful suppression (Supplementary Figure S5). To define these ROIs, regressors for apple and broccoli were extracted from every run from the suppression trials (independent of success) using 12-sec boxcars (locked to the beginning of the suppression period). We used a leave-one-run-out cross-validation scheme (see MVPA section for details) and a searchlight approach (3 voxels of radius). ROIs containing information about the contents of suppression were defined as those reaching a classification accuracy of Z > 2 (one-sample t test against chance: 50%) at the voxel level. We then corrected for multiple comparisons using cluster-extent based thresholding employing Gaussian random field theory at p < .05. Only two ROIs satisfied these statistical criteria: one in the inferior frontal gyrus (IFG) and one in the LOC (Figure 4). Importantly, this ROI definition is orthogonal to the target cross-decoding analysis where we tested the mutual informational content between imagery and successful suppression, as the training sets of both analyses are independent from each other: suppression trials for the ROI definition and imagery trials for the cross-decoding analysis.

Lateralization Analysis

Thus, LI = −1 represents fully left-lateralized effects whereas LI = +1, fully right-lateralized effects.

FMRI Signal Preprocessing

All data were analyzed using SPM12 (Wellcome Trust Centre for Neuroimaging). We realigned functional images to the first functional volume and high-pass filtered (128 sec) to remove low-frequency drifts in the signal.

Imagery versus Suppression GLM Analysis

Data were spatially normalized into the Montreal Neurological Institute template and spatially smoothed using an FWHM 8-mm 3-D Gaussian kernel. We generated regressors for each condition (imagery and suppression, independent of the imagined/suppressed object) for each run independently. We used boxcar functions of 12 sec, time-locked to the beginning of the imagery or suppression periods, to model each trial with the canonical hemodynamic response function as basis function. Vividness of the imagery and suppression trials was modeled using parametric modulators as it has been shown that brain activations are dependent on the vividness of imagery (Dijkstra, Bosch, & van Gerven, 2017). GLMs were used to test differences between imagery and suppression conditions. Participants' (n = 15) estimates (betas) of the mass-univariate GLM were fed into a second-level two-sample t-test analysis.

Successful versus Failed Suppression GLM Analysis

The analysis was performed as described in the previous paragraph except for the following differences. We generated regressors for successful and failed suppression (independent of the suppressed object) for each run independently. We used boxcar functions of 12 sec (locked to the beginning of the suppression period) and 1 sec (time-locked to the suppression break events) to model successful suppression and suppression breaks, respectively, to capture their respective sustained and transient natures (Mitchell et al., 2007). Only participants having at least 25% of successful or failed suppression trials (n = 8) were considered to have enough data to estimate the parameters.

GLM Analysis for MVPA

Data were analyzed in their native space, without spatial normalization and smoothing to avoid disrupting information contained in the spatial patterns of activation (Hebart, Görgen, & Haynes, 2015). For the task decoding (Figures 2C and 3B), we estimated GLM for imagery versus suppression and successful versus failed suppression as described above. For the content decoding (Figure 4), regressors for apple and broccoli were estimated using boxcar functions (15 sec for the perception trials and 12 sec for the imagery and successful suppression conditions). This analysis was performed on the subset of participants having at least 25% of successful or failed suppression trials (n = 8).

Imagery and suppression engage two different networks. (A) Imagery > Suppression contrast. Imagery activations (red) were found in high hierarchy visual areas (InfTG) bilaterally. Left-lateralized imagery-driven activations were also found on executive areas (InfFS and SupFS) and attention-related areas (IPS). Suppression (blue), on the other hand, was associated with right-lateralized activations in executive (ACC, SupFG, MidFS), high visual (STS), and multimodal (AngG) areas. All activations are at p < .001 (voxel level) and p < .05 cluster-level correction (Gaussian random field theory) for multiple comparisons. (B) LI for the Imagery > Suppression contrast. LI as the absolute value of the significant activations across hemispheres (see Methods section for details). Imagery activations were predominantly left lateralized, mean = −0.35, two-tailed t test, t(14) = 3.57, p = .003, 95% CI [0.14, 0.57] (uncorrected), consistent with previous reports. Suppression-related activations were, on the other hand, predominantly right lateralized, mean = 0.35, two-tailed t test, t(14) = −2.61, p = .02, 95% CI [−0.64, −0.06], uncorrected. LIs for imagery and suppression were significantly different, two-tailed t test, t(14) = 4.1, p = .001, 95% CI [0.34, 1.08]. (C) Task decoding. Visual ROIs contained useful information to reliably classify (above 80% accuracy) imagery from suppression trials, thus indicating that these conditions engage visual areas differently. V1: 88.36%, one-tailed t test, t(13) = 14.67, p = 10 −6 , 95% CI [83.73, inf] V2: 90.04%, t(13) = 11.38, p = 10 −6 , 95% CI [83.8, inf] V3: 88.66%, t(13) = 13.59, p = 10 −6 , 95% CI [83.62, inf] V4: 80.95%, t(13) = 9.7, p = 10 −6 , 95% CI [75.16, inf] all p values FDR corrected, q = 0.05. Error bars correspond to +1 SEM. AngG = angular gyrus InfFS = inferior frontal sulcus InfTG = inferior temporal gyrus IPS = intraparietal sulcus MidFS = middle frontal sulcus PosCS = postcentral sulcus SupFG = superior frontal gyrus SupFS = superior frontal sulcus.

Imagery and suppression engage two different networks. (A) Imagery > Suppression contrast. Imagery activations (red) were found in high hierarchy visual areas (InfTG) bilaterally. Left-lateralized imagery-driven activations were also found on executive areas (InfFS and SupFS) and attention-related areas (IPS). Suppression (blue), on the other hand, was associated with right-lateralized activations in executive (ACC, SupFG, MidFS), high visual (STS), and multimodal (AngG) areas. All activations are at p < .001 (voxel level) and p < .05 cluster-level correction (Gaussian random field theory) for multiple comparisons. (B) LI for the Imagery > Suppression contrast. LI as the absolute value of the significant activations across hemispheres (see Methods section for details). Imagery activations were predominantly left lateralized, mean = −0.35, two-tailed t test, t(14) = 3.57, p = .003, 95% CI [0.14, 0.57] (uncorrected), consistent with previous reports. Suppression-related activations were, on the other hand, predominantly right lateralized, mean = 0.35, two-tailed t test, t(14) = −2.61, p = .02, 95% CI [−0.64, −0.06], uncorrected. LIs for imagery and suppression were significantly different, two-tailed t test, t(14) = 4.1, p = .001, 95% CI [0.34, 1.08]. (C) Task decoding. Visual ROIs contained useful information to reliably classify (above 80% accuracy) imagery from suppression trials, thus indicating that these conditions engage visual areas differently. V1: 88.36%, one-tailed t test, t(13) = 14.67, p = 10 −6 , 95% CI [83.73, inf] V2: 90.04%, t(13) = 11.38, p = 10 −6 , 95% CI [83.8, inf] V3: 88.66%, t(13) = 13.59, p = 10 −6 , 95% CI [83.62, inf] V4: 80.95%, t(13) = 9.7, p = 10 −6 , 95% CI [75.16, inf] all p values FDR corrected, q = 0.05. Error bars correspond to +1 SEM. AngG = angular gyrus InfFS = inferior frontal sulcus InfTG = inferior temporal gyrus IPS = intraparietal sulcus MidFS = middle frontal sulcus PosCS = postcentral sulcus SupFG = superior frontal gyrus SupFS = superior frontal sulcus.

Failed suppression is correlated with activations in visual and memory areas. (A) Failed > Successful suppression contrast. Failed suppression was associated with posterior activations along the visual stream in areas such as the MidOC and MedOTS and in memory-related areas such as the ParHC and the hippocampus (not shown). On the other hand, successful suppression was associated with anterior activations in executive areas such as the MidFS, the Opc, and ACC. These results indicate that control over suppressed thoughts obeys an engagement of executive control areas, whereas failure at suppressing thoughts is accompanied by a hyperactivity of visual and memory-related areas. All results are p < .001 (voxel level) and p < .05 cluster-level correction (Gaussian random field theory) for multiple comparisons. (B) Task decoding. Visual ROIs contained useful information to classifying failed from successful suppression trials. V1: 68.95%, one-tailed t test, t(7) = 6.34, p = 2.66 · 10 −4 , 95% CI [63.28, inf] V2: 71.38%, t(7) = 9.08, p = 4.13 · 10 −5 , 95% CI [66.91, inf] V3: 80.5%, t(7) = 11.3, p = 2.06 · 10 −5 , 95% CI [75.39, inf] V4: 65.87%, t(7) = 4.02, p = .003, 95% CI [58.3886, inf] all p values FDR corrected, q = 0.05. Error bars correspond to +1 SEM. Ins = insula MidOC = middle occipital cortex MedOTS = medial occipito-temporal sulcus Opc = operculum ParHC = parahippocampal gyrus.

Failed suppression is correlated with activations in visual and memory areas. (A) Failed > Successful suppression contrast. Failed suppression was associated with posterior activations along the visual stream in areas such as the MidOC and MedOTS and in memory-related areas such as the ParHC and the hippocampus (not shown). On the other hand, successful suppression was associated with anterior activations in executive areas such as the MidFS, the Opc, and ACC. These results indicate that control over suppressed thoughts obeys an engagement of executive control areas, whereas failure at suppressing thoughts is accompanied by a hyperactivity of visual and memory-related areas. All results are p < .001 (voxel level) and p < .05 cluster-level correction (Gaussian random field theory) for multiple comparisons. (B) Task decoding. Visual ROIs contained useful information to classifying failed from successful suppression trials. V1: 68.95%, one-tailed t test, t(7) = 6.34, p = 2.66 · 10 −4 , 95% CI [63.28, inf] V2: 71.38%, t(7) = 9.08, p = 4.13 · 10 −5 , 95% CI [66.91, inf] V3: 80.5%, t(7) = 11.3, p = 2.06 · 10 −5 , 95% CI [75.39, inf] V4: 65.87%, t(7) = 4.02, p = .003, 95% CI [58.3886, inf] all p values FDR corrected, q = 0.05. Error bars correspond to +1 SEM. Ins = insula MidOC = middle occipital cortex MedOTS = medial occipito-temporal sulcus Opc = operculum ParHC = parahippocampal gyrus.

The contents of subjective successful suppression are decodable using information from imagery. To test whether subjectively, successfully suppressed thoughts shared informational content with imagery representations, we performed a cross-decoding analysis. We thus attempted to decode the content of successfully suppressed thoughts (broccoli or apple) using classifiers trained on imagery trials, on two ROIs were the contents of suppression were most readily extractable (ROI threshold Z > 2, Gaussian random field theory–corrected at the cluster level p < .05) see Methods section for details). The contents of successfully suppressed thoughts were decoded above chance and using patterns from imagery trials in the LOC ROI—65.6% accuracy, one-tailed t test, t(7) = 2.53, p = .0443, 95% CI [50.2, inf], FDR-corrected q = 0.05—but not in the IFG ROI (49.48%). These results indicate that subjectively, successfully suppressed thoughts contain similar information to imagery representations (arguably visual in nature as contained in visual areas). Error bars correspond to ±1 SEM.

The contents of subjective successful suppression are decodable using information from imagery. To test whether subjectively, successfully suppressed thoughts shared informational content with imagery representations, we performed a cross-decoding analysis. We thus attempted to decode the content of successfully suppressed thoughts (broccoli or apple) using classifiers trained on imagery trials, on two ROIs were the contents of suppression were most readily extractable (ROI threshold Z > 2, Gaussian random field theory–corrected at the cluster level p < .05) see Methods section for details). The contents of successfully suppressed thoughts were decoded above chance and using patterns from imagery trials in the LOC ROI—65.6% accuracy, one-tailed t test, t(7) = 2.53, p = .0443, 95% CI [50.2, inf], FDR-corrected q = 0.05—but not in the IFG ROI (49.48%). These results indicate that subjectively, successfully suppressed thoughts contain similar information to imagery representations (arguably visual in nature as contained in visual areas). Error bars correspond to ±1 SEM.

We used a well-established decoding approach to extract information related to each grating contained in the pattern of activation across voxels of a given participant using the The Decoding Toolbox (Hebart et al., 2015). For the task decoding (Figures 2C and 3B), we used a leave-one-run-out cross-validation scheme. We trained a linear supporting vector machine on all runs except one and then tested on the remaining one. We repeated this procedure until all runs were used as test and then averaged the results across validations (eightfold). Using this approach, we tested whether information about the task nature could be decoded from functionally defined visual areas (from V1 to V4 see Visual ROI Functional Definition section for details). For the content decoding analysis (Figure 4), we employed cross-classification to generalize information between imagery and the successful suppression trials. We thus trained on the ensemble of the imagery runs and tested on the ensemble of the successful suppression trials. The same was done for the perception-successful suppression analysis (Supplementary Figure S5) where the ensemble of perception runs was used for training. No cross-validation was used here as the data sets were independent thus, there was no risk of overfitting. We employed an ROI to test common representational content in functionally defined areas as containing suppressed object information (see Suppressed Object Information Containing ROI Definition section for details). Decoding accuracies were averaged across runs and tested against chance level (50%) using a one-sample t test across participants.

Statistical Analysis on Brain Images

All second-level (across participants) brain statistical images (derived from SPM or decoding) were subjected to a threshold at the voxel level p < .001, as recommended in previous studies (Woo, Krishnan, & Wager, 2014). We then corrected for multiple comparisons using cluster-extent based thresholding employing Gaussian random field theory (Worsley et al., 1996 Friston, Worsley, Frackowiak, Mazziotta, & Evans, 1994) at p < .05, as implemented in FMRIB Software Library (Smith et al., 2004). Importantly, these thresholds have been shown to be valid within the nominal false-positive ratios (Eklund, Nichols, & Knutsson, 2016).


The authors declare that they have no conflicts of interest.

MVPA decoding methods could successfully distinguish between conditions because of differences in the multivariate means of response patterns between conditions or because of differences in the variability of those patterns across trials. Perhaps counterintuitively, the latter could constitute signal rather than noise (e.g. if this information is read out and used by other brain areas). However, differences in response variability between conditions could also easily arise from extraneous factors of the sort described in this section (e.g. differences between conditions in how equally trials are distributed across runs). Thus, it is critically important to minimize factors that could lead to differences in both the mean and variability of response patterns between conditions. See Hebart and Baker (2018) for an in-depth discussion of this issue.

Note that if the trial order is not fully randomized, response patterns should only be compared across runs for both decoding and similarity analyses see Mumford et al. (2014) for further discussion of this issue.


Materials and Methods

Participants.

Twenty-five healthy volunteers took part in the initial experiment conducted in Trento and 21 healthy volunteers took part in the control experiment performed in Salzburg. All participants presented normal or corrected-to-normal vision and no neurological or psychiatric disorders. Three participants for the initial experiment and one participant for the control experiment were excluded from the analysis due to excessive artifacts in the MEG data leading to an insufficient number of trials per condition after artifact rejection (fewer than 30 trials for at least one condition). Additionally, within each experiment six participants were discarded from the analysis because the false alarms rate exceeded 30% and/or the near-threshold detection rate was over 85% or below 15% for at least one sensory modality (due to threshold identification failure and difficulty in using response button mapping during the control experiment, also leaving fewer than 30 trials for at least one relevant condition in one sensory modality: detected or undetected). The remaining 16 participants (11 females, mean age 28.8 y SD, 3.4 y) for the initial experiment and 14 participants (9 females, mean age 26.4 y SD, 6.4 y) for the control experiment reported normal tactile and auditory perception. The ethics committee of the University of Trento and University of Salzburg, respectively, approved the experimental protocols that were used with the written informed consent of each participant.

Stimuli.

To ensure that the participant did not hear any auditory cues caused by the piezo-electric stimulator during tactile stimulation, binaural white noise was presented during the entire experiment (training blocks included). Auditory stimuli were presented binaurally using MEG-compatible tubal in-ear headphones (SOUNDPixx VPixx Technologies). Short bursts of white noise with a length of 50 ms were generated with Matlab and multiplied with a Hanning window to obtain a soft on- and offset. Participants had to detect short white noise bursts presented near their hearing threshold (27). The intensity of such transient target auditory stimuli was determined prior to the experiment to emerge from the background constant white noise stimulation. Visual stimuli were Gabor ellipsoid (tilted 45° 1.4° radius frequency, 0.1 Hz phase, 90 sigma of Gaussian, 10) back projected on a translucent screen by a Propixx DLP projector (VPixx Technologies) at a refresh rate of 180 frames per second. On the black screen background, a centered gray fixation circle (2.5° radius) with a central white dot was used as a fixation point. The stimuli were presented for 50 ms in the center of the screen at a viewing distance of 110 cm. Tactile stimuli were delivered with a 50-ms stimulation to the tip of the left index finger, using one finger module of a piezo-electric stimulator (Quaerosys) with 2 × 4 rods, which can be raised to a maximum of 1 mm. The module was attached to the finger with tape and the participant’s left hand was cushioned to prevent any unintended pressure on the module (28). For the control experiment (conducted in another laboratory i.e., Salzburg), visual, auditory, and tactile stimulation setups were identical but we used a different MEG/MRI vibrotactile stimulator system (CM3 Cortical Metrics).

Task and Design.

The participants performed three blocks of a NT perception task. Each block included three separate runs (100 trials each) for each sensory modality: tactile (T), auditory (A), and visual (V). A short break (∼1 min) separated each run and longer breaks (∼4 min) were provided to the participants after each block. Inside a block, runs alternated in the same order within subject and were pseudorandomized across subjects (i.e., subject 1 = TVA-TVA-TVA subject 2 = VAT-VAT-VAT …). Participants were asked to fixate on a central white dot in a gray central circle at the center of the screen throughout the whole experiment to minimize eye movements.

A short training run with 20 trials was conducted to ensure that participants had understood the task. Then, in three different training sessions prior to the main experiment, participants’ individual perceptual thresholds (tactile, auditory, and visual) were determined in the shielded room. For the initial experiment, a one-up/one-down staircase procedure with two randomly interleaved staircases (one upward and one downward) was used with fixed step sizes. For the control experiment we used a Bayesian active sampling protocol to estimate psychometric slope and threshold for each participant (62). Once determined by these staircase procedures, all near-threshold stimulation intensities remained stable during each block of the whole experiment for a given participant. All stimulation intensities can be found in SI Appendix, Table S1.

The main experiment consisted of a detection task (Fig. 1A). At the beginning of each run, participants were told that on each trial a weak stimulus (tactile, auditory, or visual depending on the run) could be presented at random time intervals. Five hundred milliseconds after the target stimulus onset, participants were prompted to indicate whether they had felt the stimulus with an on-screen question mark (maximal response time: 2 s). Responses were given using MEG-compatible response boxes with the right index finger and the middle finger (response-button mapping was counterbalanced among participants). Trials were then classified into hits (detected stimulus) and misses (undetected stimulus) according to the participants’ answers. Trials with no response were rejected. Catch (above perceptual threshold stimulation intensity) and sham (absent stimulation) trials were used to control false alarms and correct rejection rates across the experiment. Overall, there were nine runs with 100 trials each (in total 300 trials for each sensory modality). Each trial started with a variable interval (1.3 to 1.8 s, randomly distributed) followed by an experimental near-threshold stimulus (80 per run), a sham stimulus (10 per run), or a catch stimulus (10 per run) of 50 ms each. Each run lasted for ∼5 min. The whole experiment lasted for ∼1 h.

Identical timing parameters were used in the control experiment. However, a specific response screen design was used to control for motor response mapping. For each trial the participants must use a different response mapping related to the circle’s color surrounding the question mark during the response screen. Two colors (blue or yellow) were used and presented randomly after each trial during the control experiment. One color was associated to the response mapping rule “press the button only if there is a stimulation” (for the near-threshold condition, detected) and the other color was associated to the opposite response mapping rule “press a button only if there is no stimulation” (for the near-threshold condition, undetected). The association between one response mapping and a specific color (blue or yellow) was fixed for a single participant but was predefined randomly across different participants. Importantly, by delaying the response mapping to after the stimulus presentation in an (for the individual) unpredictable manner, neural patterns during relevant periods putatively cannot be confounded by response selection/preparation. Both experiments were programmed in Matlab using the open source Psychophysics Toolbox (63).

MEG Data Acquisition and Preprocessing.

MEG was recorded at a sampling rate of 1 kHz using a 306-channel (204 first-order planar gradiometers, 102 magnetometers) VectorView MEG system for the first experiment in Trento and a Triux MEG system for the control experiment in Salzburg (Elekta-Neuromag Ltd.) in a magnetically shielded room (AK3B Vakuumschmelze). Before the experiments, individual head shapes were acquired for each participant including fiducials (nasion and preauricular points) and around 300 digitized points on the scalp with a Polhemus Fastrak digitizer. Head positions of the individuals relative to the MEG sensors were continuously controlled within a run using five coils. Head movements did not exceed 1 cm within and between blocks.

Data were analyzed using the Fieldtrip toolbox (64) and the CoSMoMVPA toolbox (65) in combination with MATLAB 8.5 (MathWorks). First, a high-pass filter at 0.1 Hz (FIR filter with transition bandwidth 0.1 Hz) was applied to the continuous data. Then the data were segmented from 1,000 ms before to 1,000 ms after target stimulation onset and down-sampled to 512 Hz. Trials containing physiological or acquisition artifacts were rejected. A semiautomatic artifact detection routine identified statistical outliers of trials and channels in the datasets using a set of different summary statistics (variance, maximum absolute amplitude, maximum z value). These trials and channels were removed from each dataset. Finally, the data were visually inspected and any remaining trials and channels with artifacts were removed manually. Across subjects, an average of five channels (±2 SD) were rejected. Bad channels were excluded from the whole dataset. A detailed report of the remaining number of trials per condition for each participant can be found SI Appendix, Table S1. Finally, in all further analyses and within each sensory modality for each subject, an equal number of detected and undetected trials was randomly selected to prevent any bias across conditions (66).

Source Analyses.

Neural activity evoked by stimulus onset was investigated by computing ERFs. For all source-level analyses, the preprocessed data were 30-Hz lowpass filtered and projected to source level using a linearly constrained minimum variance (LCMV) beamformer analysis (67). For each participant, realistically shaped, single-shell head models (68) were computed by coregistering the participants’ head shapes either with their structural MRI or—when no individual MRI was available (three participants and two participants, for the initial experiment and the control experiment, respectively)—with a standard brain from the Montreal Neurological Institute (MNI), warped to the individual head shape. A grid with 1.5-cm resolution based on an MNI template brain was morphed into the brain volume of each participant. A common spatial filter (for each grid point and each participant) was computed using the lead fields and the common covariance matrix, taking into account the data from both conditions (detected and undetected or catch and sham) for each sensory modality separately. The covariance window for the beamformer filter calculation was based on 200 ms pre- to 500 ms poststimulus. Using this common filter, the spatial power distribution was then estimated for each trial separately. The resulting data were averaged relative to the stimulus onset in all conditions (detected, undetected, catch, and sham) for each sensory modality. Only for visualization purposes a baseline correction was applied to the averaged source-level data by subtracting a time window from 200 ms prestimulus to stimulus onset. Based on a significant difference between event-related fields of the two conditions over time for each sensory modality, the source localization was performed restricted to specific time windows of interest. All source images were interpolated from the original resolution onto an inflated surface of an MNI template brain available within the Caret software package (69). The respective MNI coordinates and labels of localized brain regions were identified with an anatomical brain atlas (AAL atlas ref. 70) and a network parcellation atlas (29). Source analysis of MEG data is an inherently underspecified problem and no unique solution exists. Furthermore, source leakage cannot be avoided, further reducing the accuracy of any analysis. Finally, we remind the reader that we do not expect more than 3 cm precision on our results because we used standard LCMV source localization with a 1.5-cm grid. In other words, source plots should be seen as suggestive rather conclusive evidence for underlying brain regions only.

MVPA Decoding.

MVPA decoding was performed for the period 0 to 500 ms after stimulus onset based on normalized (z-scored) single-trial source data down-sampled to 100 Hz (i.e., time steps of 10 ms). We used multivariate pattern analysis as implemented in CoSMoMVPA (65) to identify when and what kind of a common network between sensory modalities is activated during the near-threshold detection task. We defined two classes for the decoding related to the task behavioral outcome (detected and undetected). For decoding within the same sensory modality, single-trial source data were randomly assigned to one of two chunks (half of the original data).

For decoding of all sensory modalities together, single-trial source data were pseudorandomly assigned to one of the two chunks with half of the original data for each sensory modality in each chunk. Data were classified using a twofold cross-validation procedure, where a Bayes-naive classifier predicted trial conditions in one chunk after training on data from the other chunk. For decoding between different sensory modalities, single-trial source data of one modality were assigned to one testing chunk and the trials from other modalities were assigned to the training chunk. The number of target categories (e.g., detected/undetected) was balanced in each training partition and for each sensory modality. The trials data are equally partitioned into the chunks (i.e., we have the same amount of the three runs and block trials for each modality in each separate chunk used for classification). Training and testing partitions always contained different sets of data.

First, the temporal generalization method was used to explore the ability of each classifier across different time points in the training set to generalize to every time point in the testing set (21). In this analysis we used local neighborhoods features in time space (time radius of 10 ms: for each time step we included as additional features the previous and next time sample data point). We generated temporal generalization matrices of task decoding accuracy (detected/undetected), mapping the time at which the classifier was trained against the time it was tested. Generalization of decoding accuracy over time was calculated for all trials and systematically depended on a specific between- or within-sensory-modality decoding. The reported average accuracy of the classifier for each time point corresponds to the group average of individual effect size: the ability of classifiers to discriminate detected from undetected trials. We summarized time generalization by keeping only significant accuracy for each sensory modality decoding. Significant classifiers’ accuracies were normalized between 0 and 1, y t = x t − m i n ( x ) m a x ( x ) − m i n ( x ) , [1]

where x is a variable of all significant decoding accuracies and x t is a given significant accuracy at time t . Normalized accuracies ( y t ) were then averaged across significant testing time and decoding conditions. The number of significant classifier generalizations across testing time points and the relevant averaged normalized accuracies were reported along the training time dimension (Figs. 3B and 5B). For all significant time points previously identified we performed a “searchlight” analysis across brain sources and time neighborhood structure. In this analysis we used local neighborhoods features in source and time space. We used a time radius of 10 ms and a source radius of 3 cm. All significant searchlight accuracy results were averaged over time and only the maximum 10% significant accuracies were reported on brain maps for each sensory modality decoding condition (Fig. 4) or for all conditions together (Fig. 5C).

Finally, we applied the same type of analysis to all sensory modalities by taking all blocks together with detected and undetected NT trials (equalized within each sensory modality). For the control experiment, we equalized trials based on the 2 × 2 design with detection report (detected or undetected) and type of response (“button press = response” or “no response”), so that we get the same number of trials inside each category (i.e., class) for each sensory modality. We performed a similar decoding analysis by using a different class definition: either detected vs. undetected or response vs. no response (SI Appendix, Fig. S3 B and C).

Statistical Analysis.

Detection rates for the experimental trials were statistically compared to those from the catch and sham trials, using dependent samples t tests. Concerning the MEG data, the main statistical contrast was between trials in which participants reported a stimulus detection and trials in which they did not (detected vs. undetected).

The evoked response at the source level was tested at the group level for each of the sensory modalities. To eliminate polarity, statistics were computed on the absolute values of source-level event-related responses. Based on the global average of all grid points, we first identified relevant time periods with maximal difference between conditions (detected vs. undetected) by performing group analysis with sequential dependent t tests between 0 and 500 ms after stimulus onset using a sliding window of 30 ms with 10 ms overlap. P values were corrected for multiple comparisons using Bonferroni correction. Then, to derive the contributing spatial generators of this effect, the conditions detected and undetected were contrasted for the specific time periods with group statistical analysis using nonparametric cluster-based permutation tests with Monte Carlo randomization across grid points controlling for multiple comparisons (71).

The multivariate searchlight analysis results discriminating between conditions were tested at the group level by comparing the resulting individual accuracy maps against chance level (50%) using a nonparametric approach implemented in CoSMoMVPA (65), adopting 10,000 permutations to generate a null distribution. P values were set at P < 0.005 for cluster-level correction to control for multiple comparisons using a threshold-free method for clustering (72), which has been used and validated for MEG/EEG data (40, 73). The time generalization results at the group level were thresholded using a mask with corrected z score > 2.58 (or Pcorrected < 0.005) (Figs. 3A and 5A). Time points exceeding this threshold were identified and reported for each training data time course to visualize how long time generalization was significant over testing data (Figs. 3B and 5B). Significant accuracy brain maps resulting from the searchlight analysis on previously identified time points were reported for each decoding condition. The maximum 10% of averaged accuracies were depicted for each significant decoding cluster on brain maps (Figs. 4 and 5).

Data Availability.

A down-sampled (to 100 Hz) version of the data is available at the OSF public repository (https://osf.io/E5PMY/). The original nonresampled raw data are available, upon reasonable request, from the corresponding author. Data analysis code is available at the corresponding author’s GitLab repository (https://gitlab.com/gaetansanchez).


Results

Behavioral results

All participants identified catch trials with high accuracy. Mean error rates were 5.0 ± 0.8%, (SEM). Reaction times for correct responses (measured with respect to video onset) were 1953 ± 25 ms (SEM).

Univariate fMRI results

To determine ROIs for subsequent MVPA, we computed a group contrast of all eight conditions (Action × Object Category × Kinematics) versus baseline (see Materials and Methods, ROI definition). This revealed widespread activations within left and right ventral and dorsal premotor cortex, intraparietal sulcus (IPS), and occipitotemporal cortex extending dorsally into posterior IPS and ventrally into middle and inferior temporal gyrus. Peak Talairach coordinates identified in the group contrast for the ROI MVPA were as follows: −47/0/27 (left PMv), 53/0/36 (right PMv), −43/−36/39 (left IPL), 35/−35/46 (right IPL), −43/−69/−2 (left LOTC), and 43/−65/1 (right LOTC).

In addition, we computed a univariate contrast “open” versus “close” (collapsed across object category and kinematics) to test for putative univariate effects. This contrast revealed no significant effects (even after applying very liberal correction thresholds of p = 0.05 at the voxel level). The lack of significant differences in the univariate contrast suggests that the activation levels were comparable over the two actions.

ROI MVPA results

In a ROI-based MVPA, we investigated the degree of generality of action representations (see Materials and Methods for details of the procedure) in regions typically associated with action observation, i.e., PMv, IPL, and LOTC (Fig. 2).

ROI MVPA results. Mean classification accuracies for decoding at concrete (red), intermediate (green), and abstract (blue) levels. Error bars indicate SEM, asterisks indicate statistical significance (different from 50% = chance, red = FDR corrected for the number of tests). Dotted line represents decoding accuracy at chance = 50%.

In IPL and LOTC, we could decode actions at all levels of abstraction, while in PMv, we could only decode actions at the concrete level (FDR corrected α = 0.034). A three-way repeated-measures ANOVA with the factors Abstraction Level × ROI × Hemisphere revealed main effects of Abstraction Level (F(2,378) = 10.23, p < 0.001) and ROI (F(2,378) = 18.93, p < 0.001). No effects of Hemisphere and no interactions were observed (all p > 0.1). Post hoc paired samples t tests revealed that decoding accuracies for intermediate and abstract levels were significantly higher in IPL compared with PMv and in LOTC compared with PMv (Table 1). In addition, accuracies in PMv differed significantly between concrete and abstract levels (Table 2). These results demonstrate that IPL and LOTC, but not PMv, encode actions at abstract levels of representation.

Results of post hoc paired samples t test between ROIs

Results of post hoc paired samples t test between abstraction levels

A second observation is that LOTC showed significantly stronger decoding for the concrete compared with intermediate and abstract levels, whereas IPL showed relatively similar decoding accuracies across all levels (Table 2). This indicates that LOTC contains both concrete and more abstract representations whereas IPL contains abstract action representations only suggesting that generalization from perceptual to conceptual action representations takes place in LOTC (see Materials and Methods, ROI MVPA for a detailed description of expected patterns of results).

Finally, in all regions, decoding accuracies for intermediate and abstract levels were at similar levels and did not show significant differences (Table 2), suggesting that generalization from concrete (object-specific) to abstract (object category-independent) action representations does not require an additional, intermediate (object-independent but object category-specific) abstraction step (see Materials and Methods, ROI MVPA for a detailed description of expected patterns of results).

Searchlight MVPA results

A searchlight analysis corroborated the findings of the ROI MVPA (Figs. 3, 4, Table 3): at the concrete level, we decoded actions in both hemispheres throughout the occipitotemporal cortex, postcentral sulcus (PoCS), IPS, and ventral as well as dorsal premotor cortex. At intermediate and abstract levels, we decoded actions in bilateral posterior middle temporal gyrus (pMTG)/inferior temporal sulcus (pITS) and PoCS (at the junction to anterior IPS), but not in areas anterior to the postcentral sulcus.

Mean accuracy maps of the searchlight MVPA at each abstraction level (concrete, intermediate, and abstract). Individual accuracy maps were cortex-based aligned, averaged, and projected onto a common group surface (both flat maps and lateral views of inflated hemispheres). Decoding accuracy at chance is 50%. CS, central sulcus IFS, inferior frontal sulcus IPS, intraparietal sulcus ITS, inferior temporal sulcus PrCS, precentral sulcus PoCS, postcentral sulcus SFS, superior frontal sulcus STS, superior temporal sulcus.

Statistical maps of the searchlight MVPA. For intermediate and abstract levels, conjunctions (i.e., lowest common t value per vertex) of concrete/intermediate and concrete/intermediate/abstract levels, respectively, were used (see Materials and Methods for details). Alignment and projection procedures are the same as in Figure 3. Outlines around clusters indicate clusters surviving cluster size correction (dark red, concrete dark green, intermediate dark blue, abstract thresholded at p = 0.005, corrected cluster threshold p = 0.05). CS, central sulcus IFS, inferior frontal sulcus IPS, intraparietal sulcus ITS, inferior temporal sulcus PrCS, precentral sulcus PoCS, postcentral sulcus SFS, superior frontal sulcus STS, superior temporal sulcus.

Clusters identified in the searchlight MVP analysis for action decoding at concrete, intermediate, and abstract levels


METHODS

Functional Experiment

Participants

Twenty participants were recruited from the University of York. One participant's data were excluded because of excessive motion artifacts, leaving 19 participants in the final analysis (11 women mean age = 23.67 years, range = 18–37 years). Participants were native British speakers, right-handed, and had normal or corrected-to-normal vision. Participants gave written informed consent and were reimbursed for their time. The study was approved by the York Neuroimaging Centre ethics committee at the University of York.

Design

The functional experiment contained six experimental conditions, in a 2 (concepts dog, car) × 3 (type of imagery auditory, visual and conceptually complex context) design (see Supplementary Material A2 for full list of experimental conditions).

Stimuli

Experimental stimuli consisted of (i) six verbal conceptual prompts that referred to each of our six experimental conditions (e.g., Dog Sound, which cued participants to imagine what a dog sounded like) and (ii) visual and auditory noise that was presented throughout experimental conditions and rest periods. For this, Gaussian visual noise was generated through Psychopy (Pyschopy, 2.7), and auditory white noise was generated through Audacity software (Audacity Version 2.0.0), followed by (iii) target images/sounds. The targets used in this paradigm were piloted before fMRI scanning on a separate group of participants (n = 24) to determine the average length of time taken to detect a target (image or sound) emerging through noise (see Supplementary Material A1 and Table A2 for a full description of the pilot experiment). From this pilot, 10 images were selected for each of our six experimental conditions (Dog Visual-Features, Car Visual-Features, Dog Sound, Car Sound, Dog Context, and Car Context) based on statistically similar RTs for detecting the item emerging through noise (see Supplementary Material A3 for a full list of stimuli). Images were detected, on average, at 2861 msec and sounds at 2912 msec (see Table 1). The fMRI scan, therefore, allowed 3000 msec for participants to detect whether an item emerging through noise matched the content of their imagery.

Condition . fMRI Experiment .
RT (msec) . Acc (%) .
Car sound 2748 (713) 82.11 (16.53)
Dog sound 2753 (552) 76.84 (12.04)
Car visual 2704 (204) 83.68 (11.64)
Dog visual 2620 (241) 82.63 (9.91)
Car context 2754 (211) 76.76 (12.62)
Dog context 2569 (250) 79.61 (14.71)
Condition . fMRI Experiment .
RT (msec) . Acc (%) .
Car sound 2748 (713) 82.11 (16.53)
Dog sound 2753 (552) 76.84 (12.04)
Car visual 2704 (204) 83.68 (11.64)
Dog visual 2620 (241) 82.63 (9.91)
Car context 2754 (211) 76.76 (12.62)
Dog context 2569 (250) 79.61 (14.71)

Standard deviation in parentheses. Acc = percentage accuracy.

Task Procedure

Before being scanned, participants completed a practice session, identical to one scanning run. After this practice run, participants were probed to describe what they had been focused on during the pure-noise trials to ensure they were imagining the relevant concepts. For the in-scanner task, stimuli were presented in four independent runs. Within each scanning run, participants were presented a cue word (e.g., Sound DOG) and instructed to imagine this concept in the presence of visual and auditory noise for instance, they were told to imagine the sound of a dog barking, growling, yelping, and so forth. They were asked to continue imagining the relevant visual or auditory properties until the stimulus appeared. Task instructions were presented for 3 sec. A variable number of images then followed, each displaying visual and auditory noise (see Figure 1). Within the blocks, the pure-noise images were each shown for 3 sec. Following a variable length of time (between 6 and 12 sec after the initial cue), a target image or sound began to emerge through the noise (at the rate outlined in the pilot experiment described above). Participants were instructed to respond with a button press (yes/no) whether a target item emerging through visual and auditory noise was related to what they had being imagining based on the cue word. Participants were given 3000 msec to respond to this item. The block automatically ended after this image. This design afforded us the high signal sensitivity found with block designs, combined with unpredictability to keep participants cognitively engaged.

The basic design of the task, in which stimuli appeared through noise, was adapted from a previous fMRI study examining internally generated conceptual retrieval (see Coutanche & Thompson-Schill, 2015). One advantage of this method is that it required participants to sustain and search for a particular concept over time, ensuring that there were seconds of data available per trial for the decoding analyses. In addition, because the onset of the stimulus was subtle within the visual and auditory noise, participants were encouraged to engage in semantically driven retrieval before the presentation of the target.

Each experimental condition (e.g., “Dog Sound”) occurred twice in a run (giving eight blocks for each condition across the experiment). Blocks were presented in a pseudorandomized order so the same cue did not immediately repeat, and blocks were separated by 12-sec fixation. During the fixation period, the visual noise and auditory noise were also presented to create an active baseline. Fifty percent of the items emerging through noise contained an item that did not match the preceding cue (i.e., four of eight were foils) to ensure that participants focused on the specific target. To encourage participants to pay attention from the very start of every block, an additional short block was included in each run, in which an item emerged through noise after only 3 sec, followed by 12 sec of fixation. These blocks were disregarded in the analysis.

Acquisition

Data were acquired using a GE 3T HD Excite MRI scanner at the York Neuroimaging Centre, University of York. A Magnex head-dedicated gradient insert coil was used in conjunction with a birdcage, radio frequency coil tuned to 127.4 MHz. A gradient-echo EPI sequence was used to collect data from 38 bottom–up axial slices aligned with the temporal lobe (repetition time [TR] = 2 sec, echo time [TE] = 18 msec, field of view [FOV] = 192 × 192 mm, matrix size = 64 × 64, slice thickness = 3 mm, slice gap = 1 mm, flip angle = 90°). Voxel size was 3 × 3 × 3 mm. Functional images were coregistered onto a T1-weighted anatomical image from each participant (TR = 7.8 sec, TE = 3 msec, FOV = 290 mm × 290 mm, matrix size = 256 mm × 256 mm, voxel size = 1.13 mm × 1.13 mm × 1 mm) using linear registration (FLIRT, FMRIB Software Library [FSL]). This sequence was chosen as previous studies employing this sequence have produced an adequate signal-to-noise ratio in regions prone to signal dropout, such as ATL (e.g., Murphy et al., 2017 Coutanche & Thompson-Schill, 2015).

To ensure that our ROIs had sufficient signal to detect reliable fMRI activation, the temporal signal-to-noise ratio (tSNR) for each participant was calculated by dividing the mean signal in each voxel by the standard deviation of the residual error time series in that voxel (Friedman, Glover, & The FBIRN Consortium, 2006). tSNR values were averaged across the voxels in both ATL and mPFC, regions that suffer from signal loss and distortion due to their proximity to air-filled sinuses (Jezzard & Clare, 1999). Mean tSNR values, averaged across participants, were as follows: ATL, 82.85 mPFC, 97.14. The percentage of voxels in each ROI that had “good” tSNR values (>20 Binder et al., 2011) was above 97% for all ROIs: ATL, 97.19% mPFC, 99.24%. These values indicate that the tSNR was sufficient to detect reliable fMRI activation in all ROIs (Binder et al., 2011).

Preprocessing

Imaging data were preprocessed using the FSL toolbox (www.fmrib.ox.ac.uk/fsl). Images were skull-stripped using a brain extraction tool (Smith, 2002) to remove nonbrain tissue from the image. The first five volumes (10 sec) of each scan were removed to minimize the effects of magnetic saturation, and slice-timing correction was applied. Motion correction (MCFLIRT Jenkinson, Bannister, Brady, & Smith, 2002) was followed by temporal high-pass filtering (cutoff = 0.01 Hz). Individual participant data were first registered to their high-resolution T1 anatomical image and then into a standard space (Montreal Neurological Institute MNI152) this process included trilinear interpolation of voxel sizes to 2 × 2 × 2 mm. For univariate analyses, data were additionally smoothed (Gaussian FWHM 6 mm).

Multivariate Pattern Analysis

Analysis was focused on the moments when participants were imagining the target cues (e.g., thinking about what a dog looked like or what a car sounded like). The condition onset and duration were taken from the first pure-noise trial in each block (after the initial cue) to the end of the last pure-noise trial (before the item began to emerge through the noise). The response to each of the six conditions was contrasted against the active rest baseline (periods of auditory and visual noise where participants were not cued to imagine concepts). Box-car regressors for each condition, for each run, in the general linear model were convolved with a double gamma hemodynamic response function (FEAT, FSL). Regressors of no interest were also included to account for head motion within scans. MVPA was conducted on spatially unsmoothed data to preserve local voxel information. For each voxel in the brain, we computed a linear support vector machine (LIBSVM with fixed regularization hyperparameter C = 1) and a fourfold cross-validation (leave-one-run-out) classification, implemented in custom python scripts using the pyMVPA software package (Hanke et al., 2009). A support vector machine was chosen to combat overfitting by limiting the complexity of the classifier (Lewis-Peacock & Norman, 2014). The classifier was trained on three runs and tested on the independent fourth run the testing set was then alternated for each of four iterations. Classifiers were trained and tested on individual participant data transformed into MNI standard space. The functional data were first z-scored per voxel within each run. The searchlight analysis was implemented by extracting the z-scored β values from spheres (6-mm radius) centered on each voxel in the masks. This sized sphere included ∼123 3-mm voxels (when not restricted by the brain's boundary Kriegeskorte, Goebel, & Bandettini, 2006). Classification accuracy (proportion of correctly classified trials) for each sphere was assigned to the sphere's central voxel to produce accuracy maps. The resulting accuracy maps were then smoothed with a Gaussian kernel (6 mm FWHM). To determine whether accuracy maps were above chance levels (50%), individual accuracy maps were entered into a higher-level group analysis (mixed effects, FLAME www.fmrib.ox.ac.uk/fsl), testing the accuracy values across participants against chance for each voxel. Voxel inclusion was set at z = 2.3, with a cluster significance threshold at FWE p < .01.

The following classification tests were performed: (1) Car versus Dog classifier: This examined whether patterns of activity conveyed information about conceptual identity by training a classifier to discriminate between periods of noise where participants were thinking about a dog and periods of noise where participants were thinking about a car. We were not able to successfully classify the semantic class (dog vs. car) in our data set at the whole-brain level. As this analysis revealed no regions across the cortex could successfully decode this information, the remaining classification tests combined car and dog trials. (2) Auditory versus visual classifier: This examined whether patterns of activity conveyed information regarding the modality of imagery by training a classifier to discriminate between periods of noise where participants were thinking about the visual properties of objects and periods of noise where participants were thinking about the auditory properties of objects. (3) Visual versus context classifier: Here, a classifier was trained to discriminate between periods of noise where participants were thinking about the visual properties of objects and periods of time when participants were thinking about objects in more complex conceptual contexts. (4) Auditory versus context classifier: Here, a classifier was trained to discriminate between periods of noise where participants were thinking about the auditory properties of objects and periods of time when participants were thinking about objects in complex contexts. Unthresholded maps from all analyses are uploaded on Neurovault: neurovault.org/collections/2671/.

Next, we identified regions where patterns of activity consistently informed the classifier for each of our three conditions (visual, auditory, and context) by running a formal conjunction on the uncorrected searchlight maps (using the FSL easythresh command). For visual patterns, we looked at the conjunction of the two searchlight maps that decoded visual properties (visual vs. auditory and visual vs. context). Because regions that contributed to both of these searchlight maps were able to decode simple visual features in imagination, relative to both auditory features and more complex contexts, we reasoned that their pattern of activation related to simple visual features. Next, we looked at the conjunction of the two searchlight maps that decoded the auditory condition (auditory vs. visual and auditory vs. context) to identify brain regions containing patterns of activation relating to simple auditory properties in imagination. Finally, we looked at the conjunction of the two searchlight maps that decoded context properties (context vs. visual and context vs. auditory). This identified brain regions containing activation patterns relating to complex conceptual contexts, as distinct from both simple visual and auditory features. All analyses were cluster-corrected using a z-statistic threshold of 2.3 to define contiguous clusters. Multiple comparisons were controlled using a Gaussian Random Field Theory at a threshold of p < .01.

Univariate Analysis

We examined univariate activation to further characterize the response within our unimodal and transmodal regions defined by MVPA. The percent signal change was extracted for each condition from ROIs defined by the MVPA conjunctions (see above).

Resting-state fMRI

Participants

This analysis was performed on a separate cohort of 157 healthy participants at York Neuroimaging Centre (89 women mean age = 20.31 years, range = 18–31 years). Participants completed a 9-min functional connectivity MRI scan during which they were asked to rest in the scanner with their eyes open. Using these data, we examined the resting-state fMRI connectivity of our conjunction regions that were informative to decoding visual imagery, auditory imagery, and contextual imagery to investigate whether these regions fell within similar or distinct networks. The data from our resting-state scans have been used in prior published works from the same lab (e.g., Murphy et al., 2017, 2018 Villena-Gonzalez et al., 2018 Wang et al., 2018 Poerio et al., 2017 Sormaz et al., 2017 Vatansever et al., 2017).

Acquisition

As with the functional experiment, a Magnex head-dedicated gradient insert coil was used in conjunction with a birdcage, radio frequency coil tuned to 127.4 MHz. For the resting-state data, a gradient-echo EPI sequence was used to collect data from 60 axial slices with an interleaved (bottom–up) acquisition order with the following parameters: TR = 3 sec, TE = minimum full, volumes = 180, flip angle = 90°, matrix size = 64 × 64, FOV = 192 × 192 mm, voxel size = 3 × 3 × 3 mm. A minimum full TE was selected to optimize image quality (as opposed to selecting a value less than minimum full, which, for instance, would be beneficial for obtaining more slices per TR). Functional images were coregistered onto a T1-weighted anatomical image from each participant (TR = 7.8 sec, TE = 3 msec, FOV = 290 mm × 290 mm, matrix size = 256 mm × 256 mm, voxel size = 1 mm × 1 mm × 1 mm).

Preprocessing

Data were preprocessed using the FSL toolbox (www.fmrib.ox.ac.uk/fsl). Before conducting the functional connectivity analysis, the following prestatistics processing was applied to the resting-state data: motion correction using MCFLIRT to safeguard against motion-related spurious correlations slice-timing correction using Fourier space time-series phase shifting nonbrain removal using a brain extraction tool spatial smoothing using a Gaussian kernel of FWHM 6 mm grand mean intensity normalization of the entire 4-D data set by a single multiplicative factor high-pass temporal filtering (Gaussian-weighted least squares straight line fitting, with sigma = 100 sec) Gaussian low-pass temporal filtering, with sigma = 2.8 sec.

Low-level Analysis

For each conjunction site, we created spherical seed ROIs, 6 mm in diameter, centered on the peak conjunction voxel: visual conjunction site in the left inferior lateral occipital cortex [−48 −70 −2], auditory conjunction site in the left superior temporal gyrus [−48 −12 −10], and context conjunction site in the left lateral occipital cortex [−48 −60 0], respectively (see Supplementary Table A2). This ensured that we assessed the functional connectivity of a key site when the searchlight conjunction revealed a large cluster or multiple clusters. The time series of these regions were extracted and used as explanatory variables in a separate participant-level functional connectivity analysis for each seed. Participant-specific nuisance regressors were determined using a component-based noise correction (CompCor) approach (Behzadi, Restom, Liau, & Liu, 2007). This method applies PCA to the fMRI signal from participant-specific white matter and CSF ROIs. In total, there were 11 nuisance regressors, five regressors from the CompCor and a further six nuisance regressors were identified using the motion correction MCFLIRT. These principle components were then removed from the fMRI data through linear regression. The working memory (WM) and CSF covariates were generated by segmenting each individual's high-resolution structural image (using FAST in FSL Zhang, Brady, & Smith, 2001). The default tissue probability maps, referred to as prior probability maps, were registered to each individual's high-resolution structural image (T1 space), and the overlap between these prior probability maps and the corresponding CSF and WM maps was identified. These maps were then thresholded (40% for the SCF and 66% for the WM), binarized, and combined. The six motion parameters were calculated in the motion correction step during preprocessing. Movement in each of the three Cartesian directions (x, y, z) and rotational movement around three axes (pitch, yaw, roll) were included for each individual.

High-level Analysis

At the group level, the data were processed using FEAT Version 5.98 within FSL (www.fmrib.ox.ac.uk/fsl), and the analyses were carried out using FMRIB's Local Analysis of Mixed Effects (FLAME) Stage 1 with automatic outlier detection. No global signal regression was performed. The z-statistic images were then thresholded using clusters determined by z > 2.3 and a cluster-corrected significance threshold of p = .01. Finally, to determine whether our connectivity maps overlapped with one another, we calculated the number of overlapping voxels for our three conjunction site connectivity maps.


Difference between reverse inference and decoding (e.g. MVPA) in fMRI - Psychology

decoding as a special case of reverse inference bayes formula: p(m|a) = p(a|m∩task) x p(m|task) + p(a|</p> <br> <h2>Trends</h2> <p>due to limitations of univariate approaches, scientists have begun to apply multivariate statistical tools to decode how emotion constructs are represented in high-dimensional patterns of human brain activity.</p> <p>recent studies show that functional neuroimaging data can be accurately classified along affective dimensions and discrete emotion categories.</p> <p>data from studies classifying brain states into multiple emotion categories suggest that dimensions of valence and arousal do not principally organize neural representations of specific emotions.</p> <p>a central, unresolved problem in affective neuroscience is understanding how emotions are represented in nervous system activity. after prior localization approaches largely failed, researchers began applying multivariate statistical tools to reconceptualize how emotion constructs might be embedded in large-scale brain networks. findings from pattern analyses of neuroimaging data show that affective dimensions and emotion categories are uniquely represented in the activity of distributed neural systems that span cortical and subcortical regions. results from multiple-category decoding studies are incompatible with theories postulating that specific emotions emerge from the neural coding of valence and arousal. this ‘new look’ into emotion representation promises to improve and reformulate neurobiological models of affect.</p> <br> <h2>Decoding the infant mind: multivariate pattern analysis (mvpa) using fnirs</h2> <p>affiliations psychology department, princeton university, princeton, nj, united states of america, brain and cognitive sciences department, university of rochester, rochester, ny, united states of america, rochester center for brain imaging, university of rochester, rochester, ny, united states of america</p> <p>contributed equally to this work with: lauren l. emberson, benjamin d. zinszer</p> <p>affiliations brain and cognitive sciences department, university of rochester, rochester, ny, united states of america, rochester center for brain imaging, university of rochester, rochester, ny, united states of america</p> <p>affiliations brain and cognitive sciences department, university of rochester, rochester, ny, united states of america, rochester center for brain imaging, university of rochester, rochester, ny, united states of america</p> <p>affiliations brain and cognitive sciences department, university of rochester, rochester, ny, united states of america, rochester center for brain imaging, university of rochester, rochester, ny, united states of america</p>
<p><center>
<div class=