(TensorFlow) input pipelines / Threading and Queue

Posted Apr 22, 2017 Updated Nov 5, 2024

By umbum 4 min read

input pipeline(guide)

TensorFlow에서 파일을 읽어들이는 효과적인 방법은 input pipeline을 구성하는 것이다. input pipeline은 다음과 같은 단계로 구성된다.

  
## step 1
fnames = glob.glob("../sctf\_asm/imgs/\*")

## step 2 : FIFO queue를 생성하고 filename을 담는다.
## shuffling, epoch limit도 이 메소드가 지원한다.
fname\_queue = tf.train.string\_input\_producer(fnames)

## step 3 : file format에 알맞는 FileReader 설정
reader = tf.WholeFileReader()
fname, content = reader.read(fname\_queue)

## step 4 : decode
image = tf.image.decode\_png(content, channels=1)
## 여기서 decode\_image를 사용하면 더 좋지만 shape=<unknown>이 되어
## ValueError: 'images' contains no shape. 발생

## step 5 : Optional preprocessing ( resize, batch, ... )
image = tf.cast(image, tf.float32)
resized\_image = tf.image.resize\_images(image, [28, 28])
image\_batch = tf.train.batch([resized\_image], batch\_size=5)

이렇게 만들어진 input pipeline은 Queue이기 때문에, QueueRunner를 사용해야 한다.

  
sess = tf.Session()
coord = tf.train.Coordinator()
threads = tf.train.start\_queue\_runners(sess=sess, coord=coord)
sess.run(resized\_image)

Threading and Queue(guide)

QueueRunner(API)

Queues는 multiple threads를 이용해 텐서를 계산하는 TensorFlow mechanism이다. tf.train.QueueRunner를 이용해 직접 QueueRunner를 생성하고 enqueue, dequeue하는 경우도 있지만 보통은 tf.train.string_input_producer() 같은 메서드를 호출하면 Queue를 반환하며 자동으로 QueueRunner를 현재 그래프에 추가 해주기 때문에, 단순히 추가된 QueueRunner를 start 시켜주는 방식으로 사용한다.

  
tf.train.add\_queue\_runners()    #QueueRunner를 그래프에 추가
tf.train.start\_queue\_runners()  #그래프에 추가된 QueueRunner를 threads로 실행

Coordinator

Coordinator()는 여러 thread의 종료를 조정하는데 사용하는 코디네이터를 반환한다. Queues가 multiple threads에서 돌아가기 때문에 Coordinator를 이용해야 한다.

  
coord = tf.train.Coordinator()
threads = tf.train.start\_queue\_runners(sess=sess, coord=coord)

sess.run(...)

#ask(request) for all the threads to stop
coord.request\_stop()

#wait for all the threads to terminate.
coord.join(threads)

thread를 생성할 때 코디네이터 coord를 넣으면, 이 코디네이터와 연결된 thread들의 종료를 한꺼번에 제어할 수 있다. 종료 그룹같은 거라고 생각하면 된다. * coord.request_stop()을 호출하면 각 threads에 stop을 요청하게 되며, 이 메서드가 호출되고 나면 각 thread의 coord.should_stop()이 True를 반환하게 되므로 다른 thread에서 request_stop이 있었는지는 should_stop을 사용하면 체크할 수 있다.

모델에 넘기는 데이터 타입

one_hot vector로 지정하고 안하고는 label에만 해당한다. image는 지정하든 안하든 무조건 numpy.ndarray다.

  
>>> mnist
Datasets(train=...
>>> type(mnist.test.images)
<class 'numpy.ndarray'>
>>> mnist\_no\_one\_hot.test.labels[0]
7
>>> mnist.test.labels[0]
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.])
>>> mnist.test.images[0]    # == mnist\_no\_one\_hot.test.images[0]
array([ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
. . . . . . . . . . . . . .
0.        ,  0.        ,  0.        ,  0.47450984,  0.99607849,
0.81176478,  0.07058824,  0.        ,  0.        ,  0.        ,
. . . . . . . . . . . . . .
0.        ,  0.        ,  0.        ,  0.        ], dtype=float32)

어떤 이미지를 prediction해보기 위해서는 이를 numpy.ndarray로 변환해야 한다.

Machine Learning, TensorFlow

This post is licensed under CC BY 4.0 by the author.