A summary of my current views on moral theory and the value of AI
I am essentially a preference utilitarian and an illusionist regarding consciousness. This combination of views leads me to conclude that future AIs will very likely have moral value if they develop into complex agents capable of long-term planning, and are embedded within the real world. I think such AIs would have value even if their preferences look bizarre or meaningless to humans, as what matters to me is not the content of their preferences but rather the complexity and nature of their minds.
When deciding whether to attribute moral patienthood to something, my focus lies primarily on observable traits, cognitive sophistication, and most importantly, the presence of clear open-ended goal-directed behavior, rather than on speculative or less observable notions of AI welfare, about which I am more skeptical. As a rough approximation, my moral theory aligns fairly well with what is implicitly proposed by modern economists, who talk about revealed preferences and consumer welfare.
Like most preference utilitarians, I believe that value is ultimately subjective: loosely speaking, nothing has inherent value except insofar as it reflects a state of affairs that aligns with someone’s preferences. As a consequence, I am comfortable, at least in principle, with a wide variety of possible value systems and future outcomes. This means that I think a universe made of only paperclips could have value, but only if that’s what preference-having beings wanted the universe to be made out of.
To be clear, I also think existing people have value too, so this isn’t an argument for blind successionism. Also, it would be dishonest not to admit that I am also selfish to a significant degree (along with almost everyone else on Earth). What I have just described simply reflects my broad moral intuitions about what has value in our world from an impartial point of view, not a prescription that we should tile the universe with paperclips. Since humans and animals are currently the main preference-having beings in the world, at the moment I care most about fulfilling what they want the world to be like.
I agree that this sort of preference utilitarianism leads you to thinking that long run control by an AI which just wants paperclips could be some (substantial) amount good, but I think you’d still have strong preferences over different worlds.[1] The goodness of worlds could easily vary by many orders of magnitude for any version of this view I can quickly think of and which seems plausible. I’m not sure whether you agree with this, but I think you probably don’t because you often seem to give off the vibe that you’re indifferent to very different possibilities. (And if you agreed with this claim about large variation, then I don’t think you would focus on the fact that the paperclipper world is some small amount good as this wouldn’t be an important consideration—at least insofar as you don’t also expect that worlds where humans etc retain control are similarly a tiny amount good for similar reasons.)
The main reasons preference utilitarianism is more picky:
Preferences in the multiverse: Insofar as you put weight on the preferences of beings outside our lightcone (beings in the broader spatially infinte universe, Everett branches, the broader mathematical multiverse to the extent you put weight on this), then the preferences of these beings will sometimes care about what happens in our lightcone and this could easily dominate (as they are vastly more numerious and many might care about things independent of “distance”). In the world with the successful paperclipper, just as many preferences aren’t being fulfilled. You’d strongly prefer optimization to satisfy as many preferences as possible (weighted as you end up thinking is best).
Instrumentally constructed AIs with unsatisfied preferences: If future AIs don’t care at all about preference utilitarianism, they might instrumentally build other AIs who’s preferences aren’t fulfilled. As an extreme example, it might be that the best strategy for a paper clipper is to construct AIs which have very different preferences and are enslaved. Even if you don’t care about ensuring beings come into existence who’s preference are satisified, you might still be unhappy about creating huge numbers of beings who’s preferences aren’t satisfied. You could even end up in a world where (nearly) all currently existing AIs are instrumental and have preferences which are either unfulfilled or only partially fulfilled (a earlier AI initiated a system that perpetuates this, but this earlier AI no longer exists as it doesn’t care terminally about self-preservation and the system it built is more efficient than it).
AI inequality: It might be the case that the vast majority of AIs have there preferences unsatisfied despite some AIs succeeding at achieving their preference. E.g., suppose all AIs are replicators which want to spawn as many copies as possible. The vast majority of these replicator AI are operating at subsistence and so can’t replicate making their preferences totally unsatisfied. This could also happen as a result of any other preference that involves constructing minds that end up having preferences.
Weights over numbers of beings and how satisfied they are: It’s possible that in a paperclipper world, there are really a tiny number of intelligent beings because almost all self-replication and paperclip construction can be automated with very dumb/weak systems and you only occasionally need to consult something smarter than a honeybee. AIs could also vary in how much they are satisfied or how “big” their preferences are.
I think the only view which recovers indifference is something like “as long as stuff gets used and someone wanted this at some point, that’s just as good”. (This view also doesn’t actually care about stuff getting used, because there is someone existing who’d prefer the universe stays natural and/or you don’t mess with aliens.) I don’t think you buy this view?
To be clear, it’s not immediately obvious whether a preference utilitarian view like the one you’re talking about favors human control over AIs. It certainly favors control by that exact flavor of preference utilitarian view (so that you end up satisfying people across the (multi-/uni-)verse with the correct weighting). I’d guess it favors human control for broadly similar reasons to why I think more experience-focused utilitarian views also favor human control if that view is in a human.
And, maybe you think this perspective makes you so uncertain about human control vs AI control that the relative impacts current human actions could have are small given how much you weight long term outcomes relative to other stuff (like ensuring currently existing humans get to live for at least 100 more years or similar).
(This comment is copied over from LWresponding to a copy of Matthew’s comment there.)
On my best guess moral views, I think there is goodness in the paper clipper universe but this goodness (which isn’t from (acausal) trade) is very small relative to how good the universe can plausibly get. So, this just isn’t an important consideration but I certainly agree there is some value here.
The goodness of worlds could easily vary by many orders of magnitude for any version of this view I can quickly think of and which seems plausible. I’m not sure whether you agree with this, but I think you probably don’t because you often seem to give off the vibe that you’re indifferent to very different possibilities.
I don’t think I agree with the strong version of the indifference view that you’re describing here. However, I probably do agree with a weaker version. In the weaker version that I largely agree with, our profound uncertainty about the long-term future means that, although different possible futures could indeed be extremely different in terms of their value, our limited ability to accurately predict or forecast outcomes so far ahead implies that, in practice, we shouldn’t overly emphasize these differences when making almost all ordinary decisions.
This doesn’t mean I think we should completely ignore the considerations you mentioned in your comment, but it does mean that I don’t tend to find those considerations particularly salient when deciding whether to work on certain types of AI research and development.
This reasoning is similar to why I try to be kind to people around me: while it’s theoretically possible that some galaxy-brained argument might exist showing that being extremely rude to people around me could ultimately lead to far better long-term outcomes that dramatically outweigh the short-term harm, in practice, it’s too difficult to reliably evaluate such abstract and distant possibilities. Therefore, I find it more practical to focus on immediate, clear, and direct considerations, like the straightforward fact that being kind is beneficial to the people I’m interacting with.
This puts me perhaps closest to the position you identified in the last paragraph:
And, maybe you think this perspective makes you so uncertain about human control vs AI control that the relative impacts current human actions could have are small given how much you weight long term outcomes relative to other stuff (like ensuring currently existing humans get to live for at least 100 more years or similar).
Here’s an analogy that could help clarify my view: suppose we were talking about the risks of speeding up research into human genetic engineering or human cloning. In that case, I would still seriously consider speculative moral risks arising from the technology. For instance, I think it’s possible that genetically enhanced humans could coordinate to oppress or even eliminate natural unmodified humans, perhaps similar to the situation depicted in the movie GATTACA. Such scenarios could potentially have enormous long-term implications under my moral framework, even if it’s not immediately obvious what those implications might actually be.
However, even though these speculative risks are plausible and seem important to take into account, I’m hesitant to prioritize their (arguably very speculative) impacts above more practical and direct considerations when deciding whether to pursue such technologies. This is true even though it’s highly plausible that the long-run implications are, in some sense, more significant than the direct considerations that are easier to forecast.
Put more concretely, if someone argued that accelerating genetically engineering humans might negatively affect the long-term utilitarian moral value we derive from cosmic resources as a result of some indirect far-out consideration, I would likely find that argument far less compelling than if they informed me of more immediate, clear, and predictable effects of the research.
In general, I’m very cautious about relying heavily on indirect, abstract reasoning when deciding what actions we should take or what careers we should pursue. Instead, I prefer straightforward considerations that are harder to fool oneself about.
Gotcha, so if I understand correctly, you’re more so leaning on uncertainty for being mostly indifferent rather than on thinking you’d actually be indifferent if you understood exactly what would happen in the long run. This makes sense.
(I have a different perspective on decision making that has high stakes under uncertainty and I don’t personally feel sympathetic to this sort of cluelessness perspective as a heuristic in most cases or as a terminal moral view. See also the CLR work on cluelessness. Separately, my intuitions around cluelessness imply that, to the extent I put weight on this, when I’m clueless, I get more worried about unilateralists curse and downside which you don’t seem to put much weight on, though just rounding all kinda-uncertain long run effects to zero isn’t a crazy perspective.)
On the galaxy brained pont: I’m sympathetic to arguments against being too galaxy brained, so I see where you’re coming from there, but from my perspective, I was already responding to an argument which is one galaxy brain level deep.
I think the broader argument about AI takeover being bad from a longtermist perspective is not galaxy brained and the specialization of this argument to your flavor of preference utilitarianism also isn’t galaxy brained: you have some specific moral views (in this case about prefence utilitarianism) and all else equal you’d expect humans to share these moral views more than AIs that end up taking over despite their developers not wanting the AI to take over. So (all else equal) this makes AI takeover look bad, because if beings share your preferences, then more good stuff will happen.
Then you made a somewhat galaxy brained response to this about how you don’t actually care about shared preferences due to preference utilitarianism (because after all, you’re fine with any preferences right?). But, I don’t think this objection holds because there are a number of (somewhat galaxy brained) reasons why specifically optimizing for preference utilitarianism and related things may greatly outperform control by beings with arbitrary preferences.
From my perspective the argument looks sort of like:
Non galaxy brained argument for AI takeover being bad
Somewhat galaxy brained rebuttal by you about preference utilitarianism meaning you don’t actually care about this sort of preference similarity argument case for avoiding nonconsensual AI takeover
My somewhat galaxy brained response, but which is only galaxy brained substantially because it’s responding to a galaxy brained perspective abiut details of the long run future.
I’m sympathetic to cutting off at an earlier point and rejecting all galaxy brained arguments. But, I think the preference utilitarian argument you’re giving is already quite galaxy brained and sensitive to details of the long run future.
I’m sympathetic to cutting off at an earlier point and rejecting all galaxy brained arguments.
As am I. At least when it comes to the important action-relevant question of whether to work on AI development, in the final analysis, I’d probably simplify my reasoning to something like, “Accelerating general-purpose technology seems good because it improves people’s lives.” This perspective roughly guides my moral views on not just AI, but also human genetic engineering, human cloning, and most other potentially transformative technologies.
I mention my views on preference utilitarianism mainly to explain why I don’t particularly value preserving humanity as a species beyond preserving the individual humans who are alive now. I’m not mentioning it to commit to any form of galaxy-brained argument that I think makes acceleration look great for the long-term. In practice, the key reason I support accelerating most technology, including AI, is simply the belief that doing so would be directly beneficial to people who exist or who will exist in the near-term.
And to be clear, we could separately discuss what effect this reasoning has on the more abstract question of whether AI takeover is bad or good in expectation, but here I’m focusing just on the most action-relevant point that seems salient to me, which is whether I should choose to work on AI development based on these considerations.
I’m relatively confident in these views, with the caveat that much of what I just expressed concerns morality, rather than epistemic beliefs about the world. I’m not a moral realist, so I am not quite sure how to parse my “confidence” in moral views.
From an antirealist perspective, at least on the ‘idealizing subjectivism’ form of antirealism, moral uncertainty can be understood as uncertainty about the result of an idealization process. Under this view, there exists some function that takes your current, naive values as input and produces idealized values as output—and your moral uncertainty is uncertainty about the output.
A summary of my current views on moral theory and the value of AI
I am essentially a preference utilitarian and an illusionist regarding consciousness. This combination of views leads me to conclude that future AIs will very likely have moral value if they develop into complex agents capable of long-term planning, and are embedded within the real world. I think such AIs would have value even if their preferences look bizarre or meaningless to humans, as what matters to me is not the content of their preferences but rather the complexity and nature of their minds.
When deciding whether to attribute moral patienthood to something, my focus lies primarily on observable traits, cognitive sophistication, and most importantly, the presence of clear open-ended goal-directed behavior, rather than on speculative or less observable notions of AI welfare, about which I am more skeptical. As a rough approximation, my moral theory aligns fairly well with what is implicitly proposed by modern economists, who talk about revealed preferences and consumer welfare.
Like most preference utilitarians, I believe that value is ultimately subjective: loosely speaking, nothing has inherent value except insofar as it reflects a state of affairs that aligns with someone’s preferences. As a consequence, I am comfortable, at least in principle, with a wide variety of possible value systems and future outcomes. This means that I think a universe made of only paperclips could have value, but only if that’s what preference-having beings wanted the universe to be made out of.
To be clear, I also think existing people have value too, so this isn’t an argument for blind successionism. Also, it would be dishonest not to admit that I am also selfish to a significant degree (along with almost everyone else on Earth). What I have just described simply reflects my broad moral intuitions about what has value in our world from an impartial point of view, not a prescription that we should tile the universe with paperclips. Since humans and animals are currently the main preference-having beings in the world, at the moment I care most about fulfilling what they want the world to be like.
I agree that this sort of preference utilitarianism leads you to thinking that long run control by an AI which just wants paperclips could be some (substantial) amount good, but I think you’d still have strong preferences over different worlds.[1] The goodness of worlds could easily vary by many orders of magnitude for any version of this view I can quickly think of and which seems plausible. I’m not sure whether you agree with this, but I think you probably don’t because you often seem to give off the vibe that you’re indifferent to very different possibilities. (And if you agreed with this claim about large variation, then I don’t think you would focus on the fact that the paperclipper world is some small amount good as this wouldn’t be an important consideration—at least insofar as you don’t also expect that worlds where humans etc retain control are similarly a tiny amount good for similar reasons.)
The main reasons preference utilitarianism is more picky:
Preferences in the multiverse: Insofar as you put weight on the preferences of beings outside our lightcone (beings in the broader spatially infinte universe, Everett branches, the broader mathematical multiverse to the extent you put weight on this), then the preferences of these beings will sometimes care about what happens in our lightcone and this could easily dominate (as they are vastly more numerious and many might care about things independent of “distance”). In the world with the successful paperclipper, just as many preferences aren’t being fulfilled. You’d strongly prefer optimization to satisfy as many preferences as possible (weighted as you end up thinking is best).
Instrumentally constructed AIs with unsatisfied preferences: If future AIs don’t care at all about preference utilitarianism, they might instrumentally build other AIs who’s preferences aren’t fulfilled. As an extreme example, it might be that the best strategy for a paper clipper is to construct AIs which have very different preferences and are enslaved. Even if you don’t care about ensuring beings come into existence who’s preference are satisified, you might still be unhappy about creating huge numbers of beings who’s preferences aren’t satisfied. You could even end up in a world where (nearly) all currently existing AIs are instrumental and have preferences which are either unfulfilled or only partially fulfilled (a earlier AI initiated a system that perpetuates this, but this earlier AI no longer exists as it doesn’t care terminally about self-preservation and the system it built is more efficient than it).
AI inequality: It might be the case that the vast majority of AIs have there preferences unsatisfied despite some AIs succeeding at achieving their preference. E.g., suppose all AIs are replicators which want to spawn as many copies as possible. The vast majority of these replicator AI are operating at subsistence and so can’t replicate making their preferences totally unsatisfied. This could also happen as a result of any other preference that involves constructing minds that end up having preferences.
Weights over numbers of beings and how satisfied they are: It’s possible that in a paperclipper world, there are really a tiny number of intelligent beings because almost all self-replication and paperclip construction can be automated with very dumb/weak systems and you only occasionally need to consult something smarter than a honeybee. AIs could also vary in how much they are satisfied or how “big” their preferences are.
I think the only view which recovers indifference is something like “as long as stuff gets used and someone wanted this at some point, that’s just as good”. (This view also doesn’t actually care about stuff getting used, because there is someone existing who’d prefer the universe stays natural and/or you don’t mess with aliens.) I don’t think you buy this view?
To be clear, it’s not immediately obvious whether a preference utilitarian view like the one you’re talking about favors human control over AIs. It certainly favors control by that exact flavor of preference utilitarian view (so that you end up satisfying people across the (multi-/uni-)verse with the correct weighting). I’d guess it favors human control for broadly similar reasons to why I think more experience-focused utilitarian views also favor human control if that view is in a human.
And, maybe you think this perspective makes you so uncertain about human control vs AI control that the relative impacts current human actions could have are small given how much you weight long term outcomes relative to other stuff (like ensuring currently existing humans get to live for at least 100 more years or similar).
(This comment is copied over from LW responding to a copy of Matthew’s comment there.)
On my best guess moral views, I think there is goodness in the paper clipper universe but this goodness (which isn’t from (acausal) trade) is very small relative to how good the universe can plausibly get. So, this just isn’t an important consideration but I certainly agree there is some value here.
I don’t think I agree with the strong version of the indifference view that you’re describing here. However, I probably do agree with a weaker version. In the weaker version that I largely agree with, our profound uncertainty about the long-term future means that, although different possible futures could indeed be extremely different in terms of their value, our limited ability to accurately predict or forecast outcomes so far ahead implies that, in practice, we shouldn’t overly emphasize these differences when making almost all ordinary decisions.
This doesn’t mean I think we should completely ignore the considerations you mentioned in your comment, but it does mean that I don’t tend to find those considerations particularly salient when deciding whether to work on certain types of AI research and development.
This reasoning is similar to why I try to be kind to people around me: while it’s theoretically possible that some galaxy-brained argument might exist showing that being extremely rude to people around me could ultimately lead to far better long-term outcomes that dramatically outweigh the short-term harm, in practice, it’s too difficult to reliably evaluate such abstract and distant possibilities. Therefore, I find it more practical to focus on immediate, clear, and direct considerations, like the straightforward fact that being kind is beneficial to the people I’m interacting with.
This puts me perhaps closest to the position you identified in the last paragraph:
Here’s an analogy that could help clarify my view: suppose we were talking about the risks of speeding up research into human genetic engineering or human cloning. In that case, I would still seriously consider speculative moral risks arising from the technology. For instance, I think it’s possible that genetically enhanced humans could coordinate to oppress or even eliminate natural unmodified humans, perhaps similar to the situation depicted in the movie GATTACA. Such scenarios could potentially have enormous long-term implications under my moral framework, even if it’s not immediately obvious what those implications might actually be.
However, even though these speculative risks are plausible and seem important to take into account, I’m hesitant to prioritize their (arguably very speculative) impacts above more practical and direct considerations when deciding whether to pursue such technologies. This is true even though it’s highly plausible that the long-run implications are, in some sense, more significant than the direct considerations that are easier to forecast.
Put more concretely, if someone argued that accelerating genetically engineering humans might negatively affect the long-term utilitarian moral value we derive from cosmic resources as a result of some indirect far-out consideration, I would likely find that argument far less compelling than if they informed me of more immediate, clear, and predictable effects of the research.
In general, I’m very cautious about relying heavily on indirect, abstract reasoning when deciding what actions we should take or what careers we should pursue. Instead, I prefer straightforward considerations that are harder to fool oneself about.
Gotcha, so if I understand correctly, you’re more so leaning on uncertainty for being mostly indifferent rather than on thinking you’d actually be indifferent if you understood exactly what would happen in the long run. This makes sense.
(I have a different perspective on decision making that has high stakes under uncertainty and I don’t personally feel sympathetic to this sort of cluelessness perspective as a heuristic in most cases or as a terminal moral view. See also the CLR work on cluelessness. Separately, my intuitions around cluelessness imply that, to the extent I put weight on this, when I’m clueless, I get more worried about unilateralists curse and downside which you don’t seem to put much weight on, though just rounding all kinda-uncertain long run effects to zero isn’t a crazy perspective.)
On the galaxy brained pont: I’m sympathetic to arguments against being too galaxy brained, so I see where you’re coming from there, but from my perspective, I was already responding to an argument which is one galaxy brain level deep.
I think the broader argument about AI takeover being bad from a longtermist perspective is not galaxy brained and the specialization of this argument to your flavor of preference utilitarianism also isn’t galaxy brained: you have some specific moral views (in this case about prefence utilitarianism) and all else equal you’d expect humans to share these moral views more than AIs that end up taking over despite their developers not wanting the AI to take over. So (all else equal) this makes AI takeover look bad, because if beings share your preferences, then more good stuff will happen.
Then you made a somewhat galaxy brained response to this about how you don’t actually care about shared preferences due to preference utilitarianism (because after all, you’re fine with any preferences right?). But, I don’t think this objection holds because there are a number of (somewhat galaxy brained) reasons why specifically optimizing for preference utilitarianism and related things may greatly outperform control by beings with arbitrary preferences.
From my perspective the argument looks sort of like:
Non galaxy brained argument for AI takeover being bad
Somewhat galaxy brained rebuttal by you about preference utilitarianism meaning you don’t actually care about this sort of preference similarity argument case for avoiding nonconsensual AI takeover
My somewhat galaxy brained response, but which is only galaxy brained substantially because it’s responding to a galaxy brained perspective abiut details of the long run future.
I’m sympathetic to cutting off at an earlier point and rejecting all galaxy brained arguments. But, I think the preference utilitarian argument you’re giving is already quite galaxy brained and sensitive to details of the long run future.
As am I. At least when it comes to the important action-relevant question of whether to work on AI development, in the final analysis, I’d probably simplify my reasoning to something like, “Accelerating general-purpose technology seems good because it improves people’s lives.” This perspective roughly guides my moral views on not just AI, but also human genetic engineering, human cloning, and most other potentially transformative technologies.
I mention my views on preference utilitarianism mainly to explain why I don’t particularly value preserving humanity as a species beyond preserving the individual humans who are alive now. I’m not mentioning it to commit to any form of galaxy-brained argument that I think makes acceleration look great for the long-term. In practice, the key reason I support accelerating most technology, including AI, is simply the belief that doing so would be directly beneficial to people who exist or who will exist in the near-term.
And to be clear, we could separately discuss what effect this reasoning has on the more abstract question of whether AI takeover is bad or good in expectation, but here I’m focusing just on the most action-relevant point that seems salient to me, which is whether I should choose to work on AI development based on these considerations.
How confident are you about these views?
I’m relatively confident in these views, with the caveat that much of what I just expressed concerns morality, rather than epistemic beliefs about the world. I’m not a moral realist, so I am not quite sure how to parse my “confidence” in moral views.
From an antirealist perspective, at least on the ‘idealizing subjectivism’ form of antirealism, moral uncertainty can be understood as uncertainty about the result of an idealization process. Under this view, there exists some function that takes your current, naive values as input and produces idealized values as output—and your moral uncertainty is uncertainty about the output.