## Wednesday, June 15, 2016

### morris_AIMA [#9]: Chapter 2 Solution Set [unofficial]

This is a follow up to the chapter 1 solution set  for Artificial Intelligence: A Modern Approach (3rd Ed)

Disclaimer: use these solutions purely for personal enjoyment/comparison to your own. I completed this during my 2nd semester of 2nd year computer science, so I take no claim to be insightful or intelligent on these matters. These are just my attempts to follow along with the book to the best of my ability. Hopefully I'll look back here in a few years and see how far I've come.
Notes: For questions I did wrong, I highlighted them in red. Felt it would be nice to keep my solution anyway just for looking back on it.
2.1
With respect to the vacuum example, consider an atomic model with $T$ discrete states all independent, where the performance measure is the sum of clean squares among these states. If we consider a world with say, a $n*n$ grid where $n \geq 2$, then there may be an environment state where the vacuum is on a clean tile and has no dirty tile adjacent to it; but there is dirt in the distance. If the agent is rational, it would directly locate such a dirty tile and clean it to maximize the performance measure, but the ability to do this depends on how many time steps are allowed, as it may only move 1 adjacent tile per time step. In short: an agent may act rational/maximize performance measure in the longer run while making seemingly irrational intermediate decisions. As the chapter mentions, we must consider the difference between an agent (say, the vacuum) that cleans much faster in the short run but takes breaks versus an agent that cleans steadily without breaks. If given a specific time step, both agents may yield the same average cleanliness while one of them has a much higher/lower performance measure at the given time step.

2.2
Part a
Since there are very limited variables to consider this can be proved exhaustively.
case 1: Both tiles A and B are dirty, then regardless of the agents location it will clean, and increase performance measurement by 1

case 2:  Exactly one of the tiles A, or B are dirty. If the robot is on the dirty tile it will clean and increase performance measure by 1, otherwise it will move to the dirty tile (+0) which is still maximum increase in this case.

Since the performance measure doesn't punish for movement, and the robot cannot know if a tile is dirty until it reaches it, the agent is making the optimal decision for each possible state, meaning the sum (performance measure in this case) over the time steps will be maximized, thus it is rational.

Part b
The agent should clean the tile it is on if there is dirt, then it should move to the adjacent tile and clean it if there is dirt. This will be a sequential agent rather than episodic as before, because it needs to keep in track how many time it has moved. If it has moved 0 times and current tile is clean then move to left/right and increment move counter. If tile is clean and it has moved already then do nothing, this will ensure all tiles are clean and maximize long run performance measure. Since this is sequential there is internal state, that being the number of movements. There is a special case, where if both tiles are originally clean then moving ones will result in a long term performance measure that is 1 point less than possible if it just stayed still; but this is a small sacrifice compared to how inefficient it would be in other cases if it just remained still (for example if it's current tile was clean but adjacent was not, the agent would remain still and accumulate only +1 per environment state instead of sacrificing 1 to move and then getting +2 for all future states)

Part c)
Assuming movement doesn't cost 1 point as in section b, an agent should explore by first either moving left or right. The agent should have a variable tracking direction, initially setting it to left and then once it cannot move left set the variable to false to indicate it must move right (until it can't.. .etc), while doing this it should clean the current tile if there is dirt on it, otherwise move in the direction indicated. This would maximize the expected performance as given some unknown geography with N slots it would clean all of them repeatedly. There are more advanced approaches for example via machine learning to train it to recognize some distribution if the dirt is placed in such a way; but I'm assuming it is random. In my example it doesn't make sense for it to learn, just move left/right and cover everything. To have the most optimal solution learning does make sense, as it needs to identify which spots have been cleaned the most and then target them, and perhaps predict when it will be dirty next.

2.3
Part a)
False: In order for an agent to be rational, it should make an action that maximizes it's expected performance measure given a percept sequence. Even if it doesn't know of other variables in the environment, it may act rational based on what it does no. This is like the road-crossing example in the chapter. If you cross a road and look both ways for a vehicle, but die because some anvil falls on your head, crossing the road wasn't irrational because you acted in a way that would maximize expected performance.
Part B)
True: Given an environment requiring the agent to be sequential rather than episodic, a reflex-based agent will only look at the current percept and will eventually fail to make the optimal decision by analysing previous percepts.
Part C)
True: Since the task environment includes performance measure in it's description, one may define a performance measure that arbitrarily maximizes for any agent regardless of action. For example the performance measure can be "to be" or "to exist" which is true given the agent is placed in such an environment. Or more concrete, given an environment with 1 tile, where the agent must exist on that one tile, and the performance measure is increased as long as the agent is on that tile, all agents are technically rational here.
Part D)
False: The input into an agent program is raw physical sensory data that is interpreted by encoded within the agent/machine. The agent function takes abstract values representing the state the agent has experienced via percept/percept sequences and parses those; but it has no exposure to the real world. The function is a level of abstraction above the program.
Part E) (answer is actually False)
True: The implementation may vary but is merely a concrete representation of the functions co-domain containing "actions". These actions may manifest themselves in a variety of ways depending on implementation as they are just abstract concepts. For example the action "jump" has meaning to us as humans as, we being machines interpret jump by crouching of the body and thrusting ourselves up into the air; however jumping to a different machine may show differently depending on it's actuators/physical properties.
Part F)
True: Again, given a rule that maximizes performance measure for any action, such an agent would be rational. More specifically if there exists a task environment with only 1 possible action for any given percept sequence, the random selection will always yield the available option. Not all agents of this type are rational; but there exists such an environment that fits this agent.
Part G
True: Let one task environment $T_1$ have all same properties as another $T_2$ except in $T_1$ there is an extra sensor. If this sensor reads in information but has no effect on the action the agent makes, then the rational agent from $T_1$ would also be rational in $T_2$.
Part H) (actually false)
True: This follows from the fact that the agent must maximize performance measure given the evidence/experience it has. If any percept is "null" then the agent will map null to some action through the agent function, and this is the only available choice, thus whatever performance measure is made from it will be the lowest and the highest (as it is the only possible action), thus it is the highest.
Part I)
False: It will maximize it's chance of not losing, but probability states there is always the chance it will lose even if acting optimally.

2.4
a)
Performance measure: goals scored, time with ball moving forward
Environment: soccer field with other robots playing.
Actuators: Leg/kicking tool for moving/kicking
Sensors: vision, touch sensors for contact with ball
[Partially observable, multi-agent, stochastic, episodic, dynamic, continuous, known]
b)
Performance measure: area covered, stable average depth/pressure
Environment: oceans of Titan, under water
Actuator: flippers/swimming object, grabbers to collect, waterproof protection
Sensors: visual sensing, depth/pressure
[partially observable,  single-agent, stochastic, sequential, dynamic, continuous, known]
c)
Performance measure: books located/bought, % of searches that find books
Environment: internet, websites
Actuators: packet data/web requests to servers
Sensors: keyboard input from user, reading web page data
[partially observable, single-agent, non-deterministic (uncertain), sequential, dynamic, discrete, known]
d)
Performance measure: points scored on opponent
Environment: tennis court
Sensors: touch on paddle, visual ability
[fully observable (depending on quality of vision), multi-agent, deterministic, sequential, dynamic, continuous, known]
e)
Same as d, except single agent now
f)
Performance measure: height jumped
environment: ground surface clear space with space above to jump
actuators: jumping device
sensors: accelerometer, vision
[partially observable, single-agent, stochastic, sequential, static, continuous, known]
g)
Performance measure: speed of creation of proper sweater
actuators: moving arms/device for knitting
sensors: visual
[single agent, dynamic, continuous, stochastic, fully observable]
h)
performance measure: money saved as opposed to purchasing outside of auction
environment: auction house
actuators: arms to raise card, voice perhaps
sensors: visual
[multi-agent, dynamic, continuous, partially observable (depending), stochastic]

2.5
agent: that in which may act in an environment through reading in sensory information via sensors and activating via actuators.
agent function: Abstract mathematical description of an agent with respect to it's response to a given percept sequence. Maps a percept sequence to an action.
agent program: the implementation of an agent function
autonomy: the ability to acquire previously unknown information from an environment and alter internal behaviour with it.
reflex agent: reacts directly to a given percept without considering previous percepts/sequence.
model-based agent: Agent that keeps an internal tracking of experiences in the environment to build a model of how things may behave
goal-based agent: Agent that keeps a goal in mind through actions in order to further near/reach that goal.
utility-based agent: Agent that tracks an internet utility value and seeks to maximize it's utility and minimize loss, like how humans behave economically.
learning agent: An agent that is able to increase it's performance measure in an environment fully autonomous.

2.6
a)
Yes, the behaviour may just be different, or implementation. I could make a program using a different ADT, for example, or even just adjust form/structure ofit.
b)
An agent function that includes percepts in which can't be accessed for the agent would be impossible to make a program for.
c) Yes
d) n bits of storage means $2^n$ possible agent programs.
e) No? It should still read in environment data and act accordingly.
2.7: skipped, ended up focusing on 2.8 more, which made this question trivial.
2.8 & 2.9

2.10
a)
No, because in a partially observable environment, the agent will continuously move back and forth which lowers it's performance measure.
b)
Yes, a reflex agent with state can be perfectly rational. By tracking the tiles it visits it will be able to check if it is going back to a clean tile or not, and decide whether or not to move based on that.
c)
If the environment is fully observable in this case, the agent can be perfectly rational in question a, and part

2.11
a)
No, a simple reflex agent cannot be perfectly rational, as it won't know when to avoid obstacles nor when it is at a boundary of the environment. It also wouldn't know when it is best to turn left/right vs up/down.
b)
I'd say a randomized agent will do better in the long run. The reason is that, even with a percept giving adjacent tile info, a simple reflex agent will end up oscillating back and forth once it is surrounded by only clear tiles. Randomization would allow for, eventually, all tiles to be reached through some probabilistic distribution.
c)
refer to blog post/research post Have items scattered out far apart, and it will take a long time for randomization to reach them. A large grid with no walls, and dirt scattered will take longer than something in which knows how to search for dirt.
d)
Yes it can. An example could be an agent that tracks which tiles are dirty/clean as it reaches them, and then uses path searching to scan over tiles in which have not been checked yet.

2.12
The agent will have no sense of localization now, an optimal way to behave in this sense is to move randomly, and clean any tile it is on in case there is dirt on it.  If the agent is already aware of dirt, just have it act as a randomized reflex agent, moving randomly and cleaning if the tile is dirty.

2.13
a: If there is dirt on a tile and suck is used in which fails to clean the floor, the agent will still remain on that tile, and be able if reflex-based it will suck again on the next frame of the environment. Having the dirt sensor fail 10% of the time, can be easily fixed if we just check 2-3 times instead. Take several samples and determine based on that.
b) Personally the optimal decision, assuming there are no walls, is to have the agent move to the top left corner of the grid, it is in, and then move across all the rows/columns orderly, and afterwards proceed again to do so in the other direction. This would ensure that all tiles are being repeatedly checked at the same frequency.