python - Apache Spark: "java.net.SocketException: Connection reset" Error on Windows

I'm trying to set up Apache Spark on Windows 10 but keep getting errors when running spark from VSCode:

# Imports
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder.appName('PySpark Sample DataFrame').getOrCreate()

# Define Schema
col_schema = ["Language", "Version"]

# Prepare Data
Data = (("Jdk","17.0.12"), ("Python", "3.11.9"), ("Spark", "3.5.1"),   \
    ("Hadoop", "3.3 and later"), ("Winutils", "3.6"),  \
  )

# Create DataFrame
df = spark.createDataFrame(data = Data, schema = col_schema)
df.printSchema()
df.show(5,truncate=False)

. So far I've:

Used to Python 3.11.8
Downgraded to JDK 17:

Still I get the following error:

PS C:\...> & c:/.../python.exe "c:/.../test.py"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/11/16 16:40:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
root
 |-- Language: string (nullable = true)
 |-- Version: string (nullable = true) 

24/11/16 16:40:42 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1]   
java.SocketException: Connection reset
        at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:328)     
        at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
        at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)       
        at java.base/java.Socket$SocketInputStream.read(Socket.java:966)       
        at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263)
        at java.base/java.io.DataInputStream.readInt(DataInputStream.java:393)     
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
        at .apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
        at .apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at .apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at .apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at .apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
        at .apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
        at .apache.spark.rdd.MapPartitionsRDDpute(MapPartitionsRDD.scala:52)
        at .apache.spark.rdd.RDDputeOrReadCheckpoint(RDD.scala:367)
        at .apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at .apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at .apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at .apache.spark.scheduler.Task.run(Task.scala:141)
        at .apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at .apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at .apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:842)
24/11/16 16:40:42 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (Laptop executor driver): java.SocketException: Connection reset
        at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:328)
        at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
        at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)
        at java.base/java.Socket$SocketInputStream.read(Socket.java:966)
        at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263)
        at java.base/java.io.DataInputStream.readInt(DataInputStream.java:393)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
        at .apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
        at .apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at .apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at .apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at .apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
        at .apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
        at .apache.spark.rdd.MapPartitionsRDDpute(MapPartitionsRDD.scala:52)
        at .apache.spark.rdd.RDDputeOrReadCheckpoint(RDD.scala:367)
        at .apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at .apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at .apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at .apache.spark.scheduler.Task.run(Task.scala:141)
        at .apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at .apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at .apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:842)

24/11/16 16:40:42 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
  File "c:\...\test.py", line 18, in <module>
    df.show(5,truncate=False)
  File "C:\...\Python\Python312\Lib\site-packages\pyspark\sql\dataframe.py", line 947, in show
    print(self._show_string(n, truncate, vertical))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\...\Python\Python312\Lib\site-packages\pyspark\sql\dataframe.py", line 978, in _show_string
    return self._jdf.showString(n, int_truncate, vertical)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\...\Python\Python313\Lib\site-packages\py4j\java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
                   ^^^^^^^^^^^^^^^^^
  File "C:\...\Python\Python312\Lib\site-packages\pyspark\errors\exceptions\captured.py", line 179, in deco
    return f(*a, **kw)
           ^^^^^^^^^^^
  File "C:\...\Python\Python312\Lib\site-packages\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o42.showString.
: .apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (Laptop executor driver): java.SocketException: Connection reset
        at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:328)
        at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
        at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)
        at java.base/java.Socket$SocketInputStream.read(Socket.java:966)
        at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263)
        at java.base/java.io.DataInputStream.readInt(DataInputStream.java:393)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
        at .apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
        at .apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at .apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at .apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at .apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
        at .apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
        at .apache.spark.rdd.MapPartitionsRDDpute(MapPartitionsRDD.scala:52)
        at .apache.spark.rdd.RDDputeOrReadCheckpoint(RDD.scala:367)
        at .apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at .apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at .apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at .apache.spark.scheduler.Task.run(Task.scala:141)
        at .apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at .apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at .apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:842)

Driver stacktrace:
        at .apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2856)
        at .apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2792)
        at .apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2791)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at .apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2791)
        at .apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1247)
        at .apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1247)
        at scala.Option.foreach(Option.scala:407)
        at .apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1247)
        at .apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3060)
        at .apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2994)
        at .apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2983)
        at .apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at .apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:989)
        at .apache.spark.SparkContext.runJob(SparkContext.scala:2393)
        at .apache.spark.SparkContext.runJob(SparkContext.scala:2414)
        at .apache.spark.SparkContext.runJob(SparkContext.scala:2433)
        at .apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:530)
        at .apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:483)
        at .apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:61)
        at .apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:4333)
        at .apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:3316)
        at .apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4323)
        at .apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
        at .apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:4321)
        at .apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
        at .apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
        at .apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
        at .apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at .apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
        at .apache.spark.sql.Dataset.withAction(Dataset.scala:4321)
        at .apache.spark.sql.Dataset.head(Dataset.scala:3316)
        at .apache.spark.sql.Dataset.take(Dataset.scala:3539)
        at .apache.spark.sql.Dataset.getRows(Dataset.scala:280)
        at .apache.spark.sql.Dataset.showString(Dataset.scala:315)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4jmands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4jmands.CallCommand.execute(CallCommand.java:79)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:842)
Caused by: java.SocketException: Connection reset
        at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:328)
        at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
        at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)
        at java.base/java.Socket$SocketInputStream.read(Socket.java:966)
        at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263)
        at java.base/java.io.DataInputStream.readInt(DataInputStream.java:393)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
        at .apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
        at .apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at .apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at .apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at .apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
        at .apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
        at .apache.spark.rdd.MapPartitionsRDDpute(MapPartitionsRDD.scala:52)
        at .apache.spark.rdd.RDDputeOrReadCheckpoint(RDD.scala:367)
        at .apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at .apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at .apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at .apache.spark.scheduler.Task.run(Task.scala:141)
        at .apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at .apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at .apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        ... 1 more

PS C:\...\> SUCCESS: The process with PID 24468 (child process of PID 3840) has been terminated.
SUCCESS: The process with PID 3840 (child process of PID 29208) has been terminated.
SUCCESS: The process with PID 29208 (child process of PID 29332) has been terminated.

Software Versions:

Windows: 10.0.19045.5011 x64
JDK: 17.0.12
Python: 3.11.8
Spark: 3.5.3
Hadoop (Winutils): 3.3.6 (.3.6/bin)

I'm trying to set up Apache Spark on Windows 10 but keep getting errors when running spark from VSCode:

# Imports
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder.appName('PySpark Sample DataFrame').getOrCreate()

# Define Schema
col_schema = ["Language", "Version"]

# Prepare Data
Data = (("Jdk","17.0.12"), ("Python", "3.11.9"), ("Spark", "3.5.1"),   \
    ("Hadoop", "3.3 and later"), ("Winutils", "3.6"),  \
  )

# Create DataFrame
df = spark.createDataFrame(data = Data, schema = col_schema)
df.printSchema()
df.show(5,truncate=False)

. So far I've:

Used to Python 3.11.8 https://stackoverflow/a/78157930/5653263
Downgraded to JDK 17: https://stackoverflow/a/76432417/5653263

Still I get the following error:

PS C:\...> & c:/.../python.exe "c:/.../test.py"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/11/16 16:40:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
root
 |-- Language: string (nullable = true)
 |-- Version: string (nullable = true) 

24/11/16 16:40:42 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1]   
java.SocketException: Connection reset
        at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:328)     
        at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
        at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)       
        at java.base/java.Socket$SocketInputStream.read(Socket.java:966)       
        at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263)
        at java.base/java.io.DataInputStream.readInt(DataInputStream.java:393)     
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
        at .apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
        at .apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at .apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at .apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at .apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
        at .apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
        at .apache.spark.rdd.MapPartitionsRDDpute(MapPartitionsRDD.scala:52)
        at .apache.spark.rdd.RDDputeOrReadCheckpoint(RDD.scala:367)
        at .apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at .apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at .apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at .apache.spark.scheduler.Task.run(Task.scala:141)
        at .apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at .apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at .apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:842)
24/11/16 16:40:42 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (Laptop executor driver): java.SocketException: Connection reset
        at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:328)
        at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
        at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)
        at java.base/java.Socket$SocketInputStream.read(Socket.java:966)
        at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263)
        at java.base/java.io.DataInputStream.readInt(DataInputStream.java:393)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
        at .apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
        at .apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at .apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at .apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at .apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
        at .apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
        at .apache.spark.rdd.MapPartitionsRDDpute(MapPartitionsRDD.scala:52)
        at .apache.spark.rdd.RDDputeOrReadCheckpoint(RDD.scala:367)
        at .apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at .apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at .apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at .apache.spark.scheduler.Task.run(Task.scala:141)
        at .apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at .apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at .apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:842)

24/11/16 16:40:42 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
  File "c:\...\test.py", line 18, in <module>
    df.show(5,truncate=False)
  File "C:\...\Python\Python312\Lib\site-packages\pyspark\sql\dataframe.py", line 947, in show
    print(self._show_string(n, truncate, vertical))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\...\Python\Python312\Lib\site-packages\pyspark\sql\dataframe.py", line 978, in _show_string
    return self._jdf.showString(n, int_truncate, vertical)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\...\Python\Python313\Lib\site-packages\py4j\java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
                   ^^^^^^^^^^^^^^^^^
  File "C:\...\Python\Python312\Lib\site-packages\pyspark\errors\exceptions\captured.py", line 179, in deco
    return f(*a, **kw)
           ^^^^^^^^^^^
  File "C:\...\Python\Python312\Lib\site-packages\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o42.showString.
: .apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (Laptop executor driver): java.SocketException: Connection reset
        at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:328)
        at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
        at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)
        at java.base/java.Socket$SocketInputStream.read(Socket.java:966)
        at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263)
        at java.base/java.io.DataInputStream.readInt(DataInputStream.java:393)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
        at .apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
        at .apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at .apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at .apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at .apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
        at .apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
        at .apache.spark.rdd.MapPartitionsRDDpute(MapPartitionsRDD.scala:52)
        at .apache.spark.rdd.RDDputeOrReadCheckpoint(RDD.scala:367)
        at .apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at .apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at .apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at .apache.spark.scheduler.Task.run(Task.scala:141)
        at .apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at .apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at .apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:842)

Driver stacktrace:
        at .apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2856)
        at .apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2792)
        at .apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2791)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at .apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2791)
        at .apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1247)
        at .apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1247)
        at scala.Option.foreach(Option.scala:407)
        at .apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1247)
        at .apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3060)
        at .apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2994)
        at .apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2983)
        at .apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at .apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:989)
        at .apache.spark.SparkContext.runJob(SparkContext.scala:2393)
        at .apache.spark.SparkContext.runJob(SparkContext.scala:2414)
        at .apache.spark.SparkContext.runJob(SparkContext.scala:2433)
        at .apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:530)
        at .apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:483)
        at .apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:61)
        at .apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:4333)
        at .apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:3316)
        at .apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4323)
        at .apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
        at .apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:4321)
        at .apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
        at .apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
        at .apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
        at .apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at .apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
        at .apache.spark.sql.Dataset.withAction(Dataset.scala:4321)
        at .apache.spark.sql.Dataset.head(Dataset.scala:3316)
        at .apache.spark.sql.Dataset.take(Dataset.scala:3539)
        at .apache.spark.sql.Dataset.getRows(Dataset.scala:280)
        at .apache.spark.sql.Dataset.showString(Dataset.scala:315)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4jmands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4jmands.CallCommand.execute(CallCommand.java:79)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:842)
Caused by: java.SocketException: Connection reset
        at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:328)
        at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
        at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)
        at java.base/java.Socket$SocketInputStream.read(Socket.java:966)
        at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263)
        at java.base/java.io.DataInputStream.readInt(DataInputStream.java:393)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774)
        at .apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
        at .apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
        at .apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at .apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at .apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at .apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
        at .apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
        at .apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
        at .apache.spark.rdd.MapPartitionsRDDpute(MapPartitionsRDD.scala:52)
        at .apache.spark.rdd.RDDputeOrReadCheckpoint(RDD.scala:367)
        at .apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at .apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at .apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at .apache.spark.scheduler.Task.run(Task.scala:141)
        at .apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at .apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at .apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at .apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        ... 1 more

PS C:\...\> SUCCESS: The process with PID 24468 (child process of PID 3840) has been terminated.
SUCCESS: The process with PID 3840 (child process of PID 29208) has been terminated.
SUCCESS: The process with PID 29208 (child process of PID 29332) has been terminated.

Software Versions:

Windows: 10.0.19045.5011 x64
JDK: 17.0.12
Python: 3.11.8
Spark: 3.5.3
Hadoop (Winutils): 3.3.6 (https://github/cdarlint/winutils/tree/master/hadoop-3.3.6/bin)

Share Improve this question edited Nov 19, 2024 at 14:24 asked Nov 19, 2024 at 14:19 Alex Bridges 191 silver badge3 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

From your running output C:...\Python\Python312\Lib\site-packages\pyspark\sql\dataframe.py", line 947,

you still using python 3.12 to run the code, it caused the problem.

I'm using conda to create a 3.11.8 environment to test your code, it should works like below:

(base) R:\>conda create -n connection-rest2 python=3.11.8

(base) R:\>conda activate connection-rest2

(connection-rest2) R:\>python -V
Python 3.11.8

(connection-rest2) R:\>pip install pyspark

(connection-rest2) R:\>python test.py
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/11/30 21:24:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
root
 |-- Language: string (nullable = true)
 |-- Version: string (nullable = true)

+--------+-------------+
|Language|Version      |
+--------+-------------+
|Jdk     |17.0.12      |
|Python  |3.11.9       |
|Spark   |3.5.1        |
|Hadoop  |3.3 and later|
|Winutils|3.6          |
+--------+-------------+

By the way, you will get "Missing Python executable 'python3'", you should copy python.exe to python3.exe in the 'connection-rest2' virtual env directory to fix that.

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1745555270a4632781.html

python - Apache Spark: "java.net.SocketException: Connection reset" Error on Windows - Stack Overflow

1 Answer 1

发表回复

评论列表（0条）

联系我们

400-800-8888

python - Apache Spark: &quot;java.net.SocketException: Connection reset&quot; Error on Windows - Stack Overflow

1 Answer 1

相关推荐

python - Apache Spark: &quot;java.net.SocketException: Connection reset&quot; Error on Windows - Stack Overflow

发表回复

评论列表（0条）

联系我们

400-800-8888

python - Apache Spark: "java.net.SocketException: Connection reset" Error on Windows - Stack Overflow

python - Apache Spark: "java.net.SocketException: Connection reset" Error on Windows - Stack Overflow